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PREFACE. 


The  main  object  of  the  present  Volume  may  be  regarded 
as  being  to  give  a  detailed  description  of  the  basis  and 
practical  application  of  those  modern  statistical  methods 
that  are  associated  with  the  name  of  Professor  Karl  Pearson. 

The  history  of  the  work  is  briefly  as  follows : — In 
January,  1903,  Mr.  W.  Palin  Elderton  read  before  the 
Institute  of  Actuaries  an  interesting  paper  dealing  with 
the  application  of  the  Pearsonian  frequency -curves  to  the 
graduation  of  a  mortality  experience,  and  it  was  then  felt 
that  the  discussion  of  that  paper  suffered  considerably 
from  the  fact  that  Professor  Pearson's  methods,  which 
had  attracted  so  much  attention  in  purely  statistical  circles, 
were  comparatively  unfamiliar  to  the  actuarial  profession. 

It  was  therefore  suggested  by  more  than  one  member 
of  the  Council  that  it  would  be  exceedingly  useful  to  the 
profession  if  Mr.  Elderton  would  contribute  to  the  Journal  of 
the  Institute  "  an  explanatory  paper  dealing  with  frequency- 
curves,  and  giving  illustrations  of  their  use "  based  upon 
actuarial  data.  To  this  invitation  Mr.  Elderton  replied 
in  the  most  public-spirited  manner  by  preparing  a  lengthy 
paper  which  forms  the  nucleus  of  the  present  volume.  On 
consideration  it  was,  however,  felt  that  the  work  would 
be    more    generally    useful    if    published    separately    instead 


of  in  the  form  of  a  paper  in  the  Journal,  and  Mr.  Eiderton 
was  good  enough  to  undertake  the  necessary  alterations 
and  additions. 

It  is  hoped  that  in  its  present  form  the  volume  may 
be  of  use  not  only  to  the  actuarial  profession,  but  also  to 
other  classes  of  statistical  students  who  may  be  glad  to 
have  a  connected  account  of  Professor  Pearson's  methods. 

Actuarial  work,  on  its  purely  technical  side,  depends  so 
largely  upon  the  results  of  statistical  enquiries  that 
developments  and  improvements  in  statistical  methods 
must  always  be  of  great  interest  to  Actuaries,  and  the 
profession  is  much  indebted  to  Mr.  Eiderton  (who  has  had 
exceptional  opportunities  of  becoming  familiar  with  Professor 
Pearson's  work)  for  the  preparation  of  this  volume,  which 
must  have  involved  a  great  expenditure  of  thought  and 
labour.  Time  alone  can  show  whether,  and  to  what  extent, 
the  methods  which  he  has  so  ably  expounded  will  prove 
to  be  of  practical  value  in  actuarial  work,  and  it  would 
as  yet  be  premature  to  express  any  opinion  on  this  point. 
It  may,  therefore,  be  well  to  state  that  the  illustrations 
based  on  actuarial  data  must,  for  the  present,  be  regarded 
rather  as  examples  of  method  than  as  indications  of  any 
official  view  as  to  the  applicability  of  the  methods  to  actuarial 
problems.  The  fact  that  no  such  expression  of  opinion 
by  the  Institute  is  yet  possible,  however,  does  not,  in  the 
slightest    degree,    lessen    the    indebtedness    of    all    actuarial 

F.  B.  W. 


INTRODUCTION  BY  THE  AUTHOR. 


By  the  preparation  of  the  following  pages  an  attempt  is 
made  to  bring  before  Actuaries  the  more  practical  methods 
of  modern  statistical  work.  It  is  difficult  to  tell  how  far 
such  methods  may  prove  useful  in  direct  application  to 
actuarial  problems,  but  even  if  they  should  happen  to  be 
only  a  slight  assistance  it  seems  advisable  for  Actuaries  to 
have  some  knowledge  of  the  contemporary  study  of  a 
subject  connected  with  their  own  work  on  the  theoretical 
side.  It  has  been  necessary  to  exclude  some  recent  work, 
and  in  making  a  selection  it  has  been  decided  that  the 
subject  should  be  dealt  with  more  fully  from  its  practical 
than  its  theoretical  aspect,  and  that,  for  the  present,  at  any 
rate,  it  is  best  to  omit  the  more  difficult  proofs,  namely, 
those  of  the  probable  errors  of  the  coefficients  of  correlation 
and  of  the  formula  for  testing  goodness  of  fit,  while  the 
proof  that  the  method  of  moments  will  probably  give  very 
satisfactory  results  has  been  omitted  because  it  is  not 
necessary  to  an  appreciation  of  moments  as  a  practical 
method  (as  is  obvious  when  we  remember  that  it  was  not 
published  till  the  method  had  been  in  use  for  some  }'ears), 
and  also  because  it  seems  to  the  present  writer  that  until 
adjustments  have  been  found  for  all  possible  cases  its 
practical  value  is  somewhat  discounted.  The  mean  square 
contingency    is    dealt    with,    but    the    mean    contingency   is 
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neglected  because  the  mathematics  leading  to  it  are  more 
awkward,  and,  though  the  numerical  work  is  less,  the  results 
are  not  so  satisfactory. 

Some  readers  may  be  surprised  that  the  method  of  least 
squares  finds  no  mention,  but  they  should  bear  in  mind  that 
the  range  of  its  applicability  is  so  limited  that  there  is  a 
growing  tendency  to  put  it  aside  in  curve  fitting,  and  it 
seems  best  to  concentrate  attention  on  those  parts  of  the 
subject  more  likely  to  be  of  permanent  value  to  those  for 
whom  this  book  is  intended.  The  median  is  not  considered 
because  it  has  no  important  bearing  on  the  matter  dealt  with  ; 
and  if  it  should  happen  to  be  required  the  only  thing  wanted 
is  its  definition,  that  it  is  the  position  such  that  the  number  of 
cases  before  it  is  equal  to  the  number  after  it. 

The  coefficient  of  variation  has  not  been  dealt  with,  but 
here  again  it  is  well  for  the  reader  to  know  its  meaning.  In 
comparing  the  way  two  things  vary  it  is  necessary  to 
remember  that  relative  size  influences  not  only  the  mean 
"but  the  deviation  from  it,  and  in  discussing  variation  this  is 
taken  into  account   by   using   one   hundred   times    the   ratio 

which  the  standard  deviation  bears  to  the  mean  ( ). 

Vmean  / 

Part  II.  will  probably  give  Actuaries  more  difficulty  than 
Part  I.  because  it  deals  with  a  type  of  problem  that  has  at 
present  received  little  attention  from  actuarial  students,  and 
it  is  because  the  direct  bearing  of  Part  II.  on  actuarial  work 
is  somewhat  uncertain  that  it  is  dealt  with  more  in  outline 
than  Part  I.  Here,  as  in  all  other  statistical  study,  examples 
must  be  worked  out  if  the  methods  and  principles  are  to  be 
mastered.  The  reader  who  goes  through  a  book  on  a 
practical   subject   and   does   not   work   out   examples    is    as 


IX 

certain  to  encounter  imaginary  and  miss  real  difficulties,  as 
he  is  to  fail  to  obtain  any  satisfactory  knowledge  of  the 
subject. 

The  work  may  appear  to  some  to  demand  more  mathematical 
knowledge  than  most  Actuaries  possess,  and  it  may  therefore 
be  well  to  point  out  that  a  practical  man  can  use  frequency 
curves  and  correlation  reasonably  without  such  knowledge, 
for  the  fact  that  a  curve  he  has  found  agrees  with  the 
statistics  from  which  the  moments  were  obtained  is  a  proof 
that  in  the  particular  case  he  has  obtained  proper  values 
for  the  constants  even  though  he  has  not  followed  the 
mathematical  reasoning  leading  to  the  equations.  It  must 
not  of  course  be  inferred  that  belief  without  proof  is 
considered  advisable,  but  that  it  is  unwise  for  a  practical  man 
to  put  aside  a  practical  subject  which  he  can  test  practically, 
merely  because  he  cannot  follow  some  of  the  proofs. 

Frequency-curves  and  correlation  form  a  subject  in 
which  there  is  still  much  to  be  done  in  spite  of  the  rapid 
progress  that  has  been  made  recently.  There  are  few 
subjects  which  oifer  a  richer  field  for  original  work  than 
statistical  mathematics  and  its  applications.  In  this  field  the 
reader  will  find  that  in  recent  years  we  are  indebted  to 
Professor  Karl  Pearson  for  the  majority  of  the  work  that 
has  proved  a  success  in  practice,  and  anyone  writing  on 
the  subject  for  practical  men  is  bound  to  follow  in  his 
footsteps.  Those  who  become  interested  in  the  subject  are 
strongly  recommended  to  study  Professor  Pearson's  papers ; 
it  is  not  until  they  have  done  so  that  they  will  fully 
appreciate  the  great  extent  of  his  contribution  to  statistical 
science.  The  present  Author  has  merely  tried  to  bring 
together  some  of  Professor  Pearson's  results  and  give  them 
to  members  of  his  profession  with  examples  that  tend  to  show 


that  actuarial  statistics  can  be  examined  by  his  methods  in 
the  same  way  as  the  statistics  of  biology  and  anthropology. 
May  not  the  continuation  of  such  work  add  some  links  to 
the  chain  of  continuity  and  indicate  a  wider  law  than  an 
actuary  studying  his  own  subject  exclusively  might  be  led 
to  suspect  ? 

As  will  be  readily  appreciated,  the  Author  is  chiefly 
indebted  to  Professor  Pearson,  but  his  indebtedness  is  of  the 
kind  for  which  it  is  impossible  to  offer  formal  thanks ;  such 
thanks  would,  at  their  best,  fail  to  express  the  sense  of 
gratitude  which  prompted  them.  He  has  also  to  acknowledge 
much  kind  help  from  Messrs.  Gr.  J.  Lidstoxe  and  Johx 
Spencer,  both  of  whom  read  the  work  in  a  somewhat  different 
form  in  MS.  and  gave  him  many  suggestions  in  connection 
with  the  arrangement  of  the  matter,  and  the  former  has  also 
helped  him  in  many  ways  at  the  later  stages,  while  Messrs. 
S.  Ad  lard  and  E.  L.  Eldertox  have  devoted  a  large  amount 
of  time  to  reading  the  proofs,  and  have  suggested  difficulties 
that  would  probably  arise  and  ways  of  removing  them. 
Miss  Ethel  M.  Eldertox  has  rendered  assistance  in  some  of 
the  calculations  and  in  other  ways. 

W.  P.  E. 

Loxdox,  August,   1906. 


KEY   TO   THE    ACTUARIAL   TERMS 
AND   SYMBOLS    USED. 


The  following  explanation  of  certain  technical  terms  and  symbols 
that  are  used  in  this  book  is  given  as  an  assistance  to  non-actuarial 
readers.  For  a  fuller  account  of  the  functions  and  notation  reference 
can  be  made  to  the  Text-Book  for  Actuaries  (Parts  I.  and  II.),  and 
to  the  "  Account  of  the  Principles  and  Methods  adopted  in  constructing 
the  British  Offices'  Life  Tables."— London  :  C.  &  E.  Layton.    1903. 

When  an  investigation  is  made  into  the  mortality  experienced 
among  lives  assured,  the  number  of  persons  entering  at  each  age,  and 
the  numbers  passing  out  of  observation  at  each  age  owing  to 
(1)  death,  (2)  withdrawal,  by  the  policies  lapsing,  being  surrendered, 
or  terminating  from  some  other  cause,  are  recorded.  The  exposed  to 
risk  of  death  at  age  25  (E25)  means  the  number  who  had  the  chance 
of  dying  between  ages  25  and  26,  and  were  on  the  average  at  risk  for 
the  whole  year. 

The  number  who  die  between  ages  25  and  26,  divided  by  the 
exposed  to  risk  at  age  25,  gives  the  probability  of  dying  in  a  year 
at  age  25  (q25). 

When  an  experience  ends  in  any  year,  say  1900,  there  will  be  a 
large  number  of  persons  who  have  been  at  risk,  but  whose  policies  are 
still  in  force;  these  arc  called  existing  at  the  close  of  the  observation. 


Xll 

When  a  graduation  has  been  made  the  expected  deaths  are  found  by 
multiplying  the  exposed  to  risk  by  the  graduated  values  of  qx.  The 
result  is  then  compared  with  the  actual  deaths. 

qnm(5)  is  tjie  name  given  to  the  table  of  mortality  obtained  from 
the  male  lives  assured  by  ordinary  whole-life  without  profit  policies 
between  1863  and  1893,  excluding  the  first  five  years  of  assurance. 

0M  is  a  table  constructed  from  the  similar  with-proht  assurances 
for  all  durations,  and  H31  (healthy  males)  is  the  name  given  to  the 
older  experience  which  ended  in  1863. 

qx  is  (see  above)  the  probability  of  dying  in  a  year. 

px  is  the  probability  of  a  person  aged  x  living  one  }^ear. 

So  if  we  imagine  a  stationary  community,  which  a  person  can  only 
leave  by  death,  and  consider  lx  to  be  the  number  living  at  exact 
age  x,  then  lx+l=pxx  lx  and  lx+2—px+i  X  h+i,  and  so  on. 

The  value  at  a  rate  of  interest  i  per  unit  of  a  sum  of  1  payable  if 
a   person    aged   x   be   alive   at    the    end    of    n   years,    is   therefore 

VHl   '4- 

— ^— -  where  y=(l-f  i)"1,  and  the  value  of  an  annuity  of  1  would  be 

vlx+l  +  vHx+2+  ■  ■  ■ 


Xow  for  convenience  in  making  tables  it  is  well  to  multiply 
numerator  and  denominator  of  this  expression  by  vx,  and  we  have  as 
the  value  of  the  annuity 


vHx  Dx  T>x 

where  Dx=vxlx  and  Kar=Da.  +  l)j.+  i-f  .  .  . 

Similarly  *=^+S**+i+  •  •  • 

Tables  of  D,  jKT,  and  §  are  called  commutation  columns. 

Ua\  is  written  for  the  value  of  an  annuity  of  1  payable  for  n  years 
certain,  independent  of  any  life,  so  its  value  is  v  +  v2+  .  .  .  -\-vn. 

tin    is  the  value   of  a  similar  annuity   of  1  per  annum  payable 
m  times  a  year  when   m  takes   the  limit  of  oo  ,  so  that  its  value  is 


^  o 


■dm. 


Xlll 

Oology  is  the  logarithm  of  the  reciprocal  of  px,  and  Makeham's 
hypothesis  assumes  that  its  value  is  A  +  .Bca' ;  hut  colog_p.r= 
log  /.v— log  /a?+i;  therefore  an  alternative  way  of  stating  the  hypothesis 
is  lx=k&g*°. 

When  valuing  the  policies  in  an  assurance  office  the  actuary 
groups  cases  together  to  save  labour.  When  assurances  are  payable 
at  death  the}^  are  grouped  according  to  the  year  of  birth,  but  when 
they  are  payable  at  a  certain  maturity  age  or  previous  death 
(Endowment  Assurances)  they  can  be  grouped  either  according  to 
year  of  birth  or  according  to  the  number  of  years  to  run  {unexpired 
term).  In  the  latter  case  they  have  to  be  valued  by  finding  an 
average  age  at  maturity  ;  formerly  this  was  done  by  taking  the  mean 
of  the  ages,  but  Mr.  Gr.  J.  Lidstone  has  recently  shown  that  a  much 
more  accurate  result  is  reached  by  weighting  the  ages  in  Geometrical 
Progression.     The  constants  used  for  this  purpose  are  called  Z. 

A  model  office  is  an  imaginary  specimen  office  which  is  used  for 
making  approximate  valuations. 
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PART     I. 

CHAPTER    I. 

Introductoey. 

1.  The  ordinary  treatment  of  probability  begins  with  the 
assumption  that  the  chance  that  a  certain  event  will  occur  is 
known,  and  proceeds  to  solve  the  problems  that  arise  from 
the  combination  of  events  or  the  repetition  of  a  particular 
experiment ;  it  proves  that  a  certain  result  is  more  likely  to 
occur  from  experiment  than  any  other,  that  a  result  based  on 
a  limited  number  of  trials  is  unlikely  to  differ  greatly  from 
the  expected  result,  and  that  the  proportional  deviation  from 
the  most  probable  result  will  generally  decrease  as  the  number 
of  trials  is  increased. 

Experiments  can  easily  be  made  to  show  that  the 
theoretical  method  leads  to  results  which  can  be  realised  in 
practice  when  the  probabilities  can  be  estimated  accurately 
beforehand ;  for  example,  various  trials  have  been  made  with 
coin  tossing,  in  which  it  has  been  found  that  if  five  coins  are 
tossed  together  and  the  number  of  them  coming  down  "  heads  " 
is  recorded,  then  the  distribution  of  the  cases  will  agree  with 


the  binomial  expansion  (J  4-  i)5  as  the  ordinary  theory  leads  us 
to  expect.  Sequences  of  "  heads  "  or  "  tails  "  form  a  series 
approximating  to  the  Geometrical  Progression  with  a  common 
ratio  of  i,  and  the  drawing  of  cards  from  a  pack  gives  a  result 
closely  agreeing  with  the  numbers  that  theoretical  work 
suggests. 

2.   It  frequently  happens,  however,  that  the  probabilities  are 
not  known,  and  it  is  impossible  to  tell  whether  we  are  dealing 
with  an  experiment  like  coin  tossing   or  sequences  or  card- 
drawing;  in  fact,  the  only  thing  known  is  the  distribution  of 
the    number    of    cases    into     certain    groups,    and   in    these 
circumstances  the  inverse  problem  of  tracing  the  theoretical 
series   to   which   the  statistics   approximate  may  become  an 
important  matter.     The  difficulty  of  the  subject  is  increased 
because    statistics    do    not    give   the   theoretical    distribution 
exactly,    and  it   is   impossible    to   tell   where   the    differences 
between  the  actual  and  theoretical  results  lie.     To  make  the 
position  clearer  it  will  be  well  to  re-state  the  problem  and  ask 
whether  it  is  possible  to  find  the  theoretical  series  to  which  a 
series  resulting  from    a  statistical   experiment  approximates. 
It   may  be  difficult,  perhaps  impossible,  to  trace  the   simple 
probabilities  corresponding  to  a  given  case,  but  yet  practicable 
to   form  a  reasonable  opinion  of  the  series   of  numbers  that 
might  be   reached  if  the  experiment   could  be    repeated   an 
infinite  number  of  times.     On  turning  to  the  reasons  which 
make  it  advisable  to  find  this  ideal  result  to  which  statistics 
approach,  it  will  be  seen  that  the  elementary  probabilities  are 
not    so    important    as    they    seem    to    be,    and    a    reasonable 
representation  of  the  series  is  of  far  greater  practical  value. 
We  notice  that  one  of  the  first  objects  of  a  statistician  or  an 
actuary    dealing    with    statistical    work    is    to    express    the 
observations  in  a  simple  form   so   that  practical  conclusions 
can  be  easily  drawn  from  the  figures  that  have  been  collected. 
If  the   available   statistics  fall    naturally   into    fifty  or   sixty 
groups  he  has  to  decide  how  they  can  be-  arranged  to  bring 
out  the  important  features   of  the  problem   on  which  he  is 
working,   and  if  he   can  find    four  or  five   numbers    closely 
connected  with  the  original  series  which  can  be  used  as  an 
index  to  the  whole,  he  can  then  give  the  result  in  a  way  that 
might    assist    comparison    with  similar  statistics,  and   enable 
others  who  have  to  deal  with  the  facts  to  appreciate  the  whole 
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distribution  more  readily  than  they  could  do  if  it  remained  in 
its  original  form.  The  statistician  has  also  to  supply 
approximate  values  for  intermediate  terms  when  only  a  few 
can  be  obtained  from  his  experience,  or  complete  or  continue 
a  series  when  only  a  part  of  it  is  known.  In  many  cases 
he  has  to  keep  the  same  terms  as  his  original  series,  but 
remove  the  roughnesses  of  material  due  to  limitations  in  the 
number  of  cases  available  for  his  investigation;  that  is,  he 
has  to  graduate  his  data. 

3.  In  reality  these  objects  are  much  alike,  for  if  the  statistical 
tables  can  be  represented  by  an  algebraic  or  transcendental 
formula,  we  can  replace  the  whole  series  of  numbers  by  a  few 
values  (the  constants  in  the  formula)  which,  if  wTe  deal 
systematically  with  the  distributions  we  meet,  facilitate 
comparison  or  enable  us  to  supply  missing  terms,  while  the 
roughness  of  the  original  material  can  be  removed  by  making 
a  suitable  formula  represent  the  original  statistics  as  nearly  as 
possible.  If  a  formula  is  based  on  the  theoretical  considerations, 
it  should  also  give  a  solution  of  the  problem  in  probabilities 
mentioned  at  the  outset,  and  we  see  that  both  the 
practical  and  theoretical  requirements  can  be  dealt  with 
at  the  same  time,  for  the  smooth  series  sought  by  the 
theoretical  student  is  the  same  thing  as  the  formula  required 
for  practical  work. 

4.  The  advantages  of  any  system  of  curves  depend  on  the 
simplicity  of  the  formulae  and  the  number  of  classes  of 
observations  that  can  be  dealt  with  satisfactorily,  for  a 
complicated  expression  is  very  little  improvement  on  the 
original  groups  of  statistics,  and  a  system  which  is  not  capable 
of  general  application  leaves  the  statistician  in  difficulties 
whenever  it  breaks  down.  One  other  thing  is  necessary ;  if 
a  formula  is  known  to  be  a  suitable  one  there  must  be  some 
method  of  finding  the  arithmetical  constants  that  will  give  a 
good  agreement  in  the  particular  case.  Such  a  method,  if  it 
is  to  be  of  practical  use,  must  be  simple,  reliable  and  capable 
of  general  and  systematic  application. 

A  broad  idea  of  the  objects  to  be  accomplished  ought 
to  be  kept  clearly  before  the  mind ;  they  are  likely  to  be 
forgotten  because  of  the  large  amount  of  detail  necessarily 
connected  with  the  subject.  It  is  also  important  because 
the  advantages  of  systematic  treatment  are  often  overlooked, 
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and  short  cuts  and  rough  and  ready  methods  are  adopted 
to  the  detriment  of  the  work,  and  formulas  having  no 
scientific  basis  and  having  no  connection  with  others 
suitable  to  similar  cases  are  sometimes  used  in  rather 
haphazard  fashion  by  statisticians.  The  consequence  is  that 
generalisation  is  impossible,  and  where  a  law  might  be  found 
one  can  see  little  but  a  great  variety  of  attempts  by  energetic 
workers  to  reach  their  own  conclusions  regardless  of  the  value 
of  comparative  statistics. 


CHAPTER   II. 


Frequency  Distribution. 

1.  If  statistics  are  arranged  so  as  to  show  the  number  of  times, 
or  frequency  with  which,  an  event  happens  in  a  particular 
way,  then  the  arrangement  is  a  frequency  distribution. 
Although  some  of  our  results  will  be  of  wider  applicability, 
we  shall  generally   confine  our  attention  to  these  distributions. 

2.  It  is  necessary  to  have  a  name  for  the  formula  used  to 
describe  such  distributions,  and  the  term  frequency-curve  has 
been  adopted  for  the  purpose.  The  geometrical  progression 
which  describes  the  number  of  sequences  in  any  direct 
experiment,  such  as  coin  tossing  or  dice  throwing,  is  a 
frequency-curve,  the  equation  to  which  is  y  =  ~Nax. 

3.  Some  distributions  give  the  number  of  cases  fallin 
certain  group  of  values  of  the  independent  variable 
others  {e.g.,  Example  V.  of  Table  I.)  give  the  number  c 
for  an  exact  value,  and  in  the  former  case  the  exact  va. 

the  independent  variable  to  which  the  groups  correspond 
must  be  considered;  for  instance,  "exposed  to  risk  at  age  x" 
includes  those  from  x—\  to  x  +4,  but  the  number  of  deaths 
at  duration  n  those  from  n  to  n+1.  When  statistics  are 
represented  graphically,  effect  should  be  given  to  these 
differences,  and,  to  bring  out  the  points  a  little  more  clearly, 
the  diagrams  on  pp.  6  and  7  have  been  prepared.  The 
drawings  of  distributions,  such  as  those  in  the  diagram,  are 
called  frequency  polygons  or  histograms. 

4.  When  statistics  give  the  number  of  cases  for  an  exact 
value  of  the  independent  variable,  it  is  simple  to  plot  them  in 
a  diagram  by  drawing  ordinates  and  joining  their  tops,  but  in 
the  case  of   groups  of  values  there  is  a  little  complication,  for 
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we  can  either  draw  a  rectangle  standing  on  the  entire  base 
(Ex.  II.  of  diagram)  or  pnt  in  ordinates  at  the  middle  points 
of  the  bases  and  then  join  their  tops  (Ex.  III.).  The  former 
method  seems  to  give  a  better  idea  of  the  amount  of 
information  conveyed  by  the  statistics,  but,  for  some  purposes 
(e.g.,  for  seeing  the  possible  shape  of  the  curve),  the  latter  is 
more  convenient. 

5.  If  the  reader  will  now  examine  the  examples  in  Table  L, 
he  will  notice  that  the  statistics  tend  towards  a  smooth  series 

Table   I. 


Example  I. 

Example  II. 

Example 
III. 

Example 
IV. 

Example 

v. 

Curtate 

Durations. 

Withdrawals 
with  monthly 

incidence 
"  0"  in  year 
of  exit  (p.  92, 
Principles  & 

.Methods). 

Ages. 

Exposed  to 

risk  of 

Sicknes3 

(Watson, 

.1/.  r.  Tables, 

p.  19). 

Existing  at 

close  of 

observations 

Without 

Profit 

"Old" 

Assurances. 

Existing  at 

close  of 

observations 

"Old" 

Annuities 

(Females). 

Terms  of 

the 

expansion  of 

1000(i  +  f)1J 

No. 

of 

term. 

1 

308 

-19 

34 

32 

1 

2 

200 

20-24 

145 

127 

2 

3 

118 

25-29 

156 

232 

3 

4 

09 

30-34 

145 

258 

4 

5 

59 

35-39 

123 

194 

5 

6 

41- 

40-44 

103 

3 

103 

6 

7 

29 

45-49 

86 

9 

40 

7 

8 

28 

50-51 

71 

42 

11 

8 

9 

2G 

55-59 

55 

111 

29 

2 

9 

10 

21 

60-64 

37 

176 

23 

1 

10 

11 

18 

65-69 

21 

200 

81 

11 

12 

18 

70-74 

13 

193 

151 

... 

13 

12 

75-79 

7 

160 

192 

14 

11 

80-84 

3 

73 

239 

15 

5 

85-89 

1 

26 

157 

16 

11 

90-94 

6 

93 

17 

7 

95-99 

1 

29 

18 

6 

100- 

6 

19 

1 

20 

3 

21 

1 

22 

3 

... 

23 

2 

1,000 

1,000 

1,000 

1,000 

1,000 

True  Total 

1,308 

2,995,721 

2,674 

172 

Mean 

4-182 

37*8750 

68-485 

79-400 

3-998 

Standard  ) 
Deviation  j 

4-1996 

2-76810 

1-771288 

1-774894 

1-46215 

... 

Type 

I 

I 
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as  the  total  number  of  cases  is  increased,  and  from  this  it 
can  be  seen  how  naturally  practical  statistics  lead  to  the 
conception  of  a  frequency-curve  to  describe  the  smooth 
distribution  that  would  be  obtained  if  an  infinite  supply  of 
homogeneous  material  were  available  for  investigation.  In 
other  words,  such  curves  would  give  an  approximation  to  the 
total  " population"  of  which  the  particular  case  investigated 
is  a  sample. 

6.  It  may  be  noticed  that  a  frequency-curve  will  give  a 
frequency  corresponding  to  every  value  of  the  independent 
variable  along  the  whole  range  of  the  distribution,  and  will 
not  restrict  us  to  a  few  more  or  less  arbitrary  groups  as  is 
necessarily  done  by  the  actual  statistics.  The  binomial  series 
and  geometrical  progression  do  the  same  when  we  imagine  we 
are  dealing  with  something  that  can  be  divided  into  a  very 
large  number  of  groups.  Thus,  if  we  mix  a  large  quantity  of 
sand  of  two  colours  and  take  out  a  fixed  quantity  of  the 
mixture  and  record  the  number  of  grains  of  sand  of  either 
colour  in  each  drawing,  we  should  obtain  a  continuous  curve 
from  a  large  number  of  trials. 

7.  We  will  now  define  some  important  functions.  When  a 
distribution  is  arranged  according  to  the  progressive  values 
of  a  variable  characteristic,  e.g.,  duration,  age,  &c,  the 
average  value  of  that  characteristic  (not  the  average  of  the 
frequencies)  is  called  the  mean  of  the  distribution,  and  is 
given  by 

fa  x  a+fb  x  b  +/c  x  c+  .  •  •  +fn  *  "> 

fa+fb+fc+   ■  •   •   +/» 


\Y 


here  fr  is  the  frequency  corresponding  to  r;  thus,  m 
Example  L,  200  is  the  frequency  corresponding  to  2.  If  we 
assume  infinitesimal  increments,  the  mean  is  given  by 

\fx  x  ®dx 

\fxdx    ' 

where  the  limits  of  the  integral  will  be  such  as  to  cover  the 
whole  distribution.  The  mean  could  also  be  described  as 
the  position  of  the  ordinate  through  the  centre  of  gravity 
of  the  distribution  (centroid  vertical),  and  this  may  be  of 
help  to  some  readers. 
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8.  The  mode  is  the  characteristic  that  occurs  most  frequently, 
or,  in  other  words,  is  the  position  of  the  maximum  ordinate, 
and  its  calculation  can  therefore  only  be  made  approximately, 
unless  we  know  the  law  connecting  the  various  groups. 
We  cannot  find  the  mode  exactly  until  we  know  the 
frequency-curve,  because  it  is  the  position  of  an  ordinate, 
and  we  cannot  tell  from  the  rough  statistics  which  ordinate  is 
greatest. 

9,  Now  since  one  equation  or  curve  might  be  used  for  several 
distributions,  one  given  according  to  age,  a  second  of  a 
different  subject  according  to  duration,  a  third  according  to 
sums  assured,  and  so  on,  we  must  have  a  standard  of  reference 
based  on  the  distribution  itself.  For  this  purpose  a  function 
known  as  the  standard  deviation  is  used.     It  is  given  by 


v 


ffaa'*+fbb'*+  .  .  .  +f*n'*\ 


where  a  ,  h' .  .  .  n   are  the  distances  from  the  mean.     In  the 
form  of  integrals  the  standard  deviation  is 

fx  x  x2dx\ 


v\l 


where  the  distances  x  are  measured  from  the  mean. 

The  standard  deviation  measures  the  way  the  frequencies 
are  distributed  in  terms  of  the  unit  of  measurement.  Since 
the  frequencies  furthest  from  the  mean  are  multiplied  by  the 
largest  values  of  x3  a  large  standard  deviation  shows  that  the 
frequency  distribution  spreads  out  from  the  mean,  while  a 
small  standard  deviation  shows  that  the  frequency  is  closely 
concentrated  about  the  mean.  In  considering  the  relative 
sizes  of  standard  deviations,  it  is  necessary  to  bear  in  mind 
the  unit  of  measurement,  because,  if  a  given  distribution  is 
arranged  in  two  series,  first,  according  to  years  of  age,  and 
then  in  quinquennial  age  groups,  the  standard  deviation  will 
be  five  times  as  large  in  the  latter  case  as  it  is  in  the  former. 
This  can  be  seen  at  once  by  comparing  the  two  expressions 

jm^\  and  jiugm 

The  latter  is  obviously  five  times  the  former.     The  values  of 
the  standard  deviations  are  given  in  Table   I.  for   each   case. 
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Tlie  diagram,  on  p.  12,  shows  two  Curves  having  the   same 

mean  B  and  approximately  the  same  area,  but  the  dotted 
curve  has  the  larger  standard  deviation  because  it  spreads 
out  more  on  each  side  of  the  mean. 

The  reader  will  notice  from  the  algebraic  expressions 
given  above  that  the  standard  deviation  is  not  dependent  on 
the  number  of  cases  (i.e.,  on  the  absolute  size  of  the  curve) , 
but  merely  on  the  way  they  are  distributed  (i.e.,  on  the 
proportionate  numbers  or  the  shape  of  the  curve)  ;  it 
measures  the  "  spread "  or  "  scatter  "  of  the  statistics  from 
the  mean. 

10.  An  examination  of  frequency  distributions  (see  Table  I. 
and  pp.  6  and  7)  shows  that  most  of  them  start  at  zero, 
gradually  rise  to  a  maximum,  and  then  fall  sometimes  at  a 
very  different  rate.  If  the  rise  and  fall  are  at  the  same  rate, 
distribution  will  be  symmetrical  about  its  mean,  which  will 
obviously  coincide  with  the  mode.  The  difference  between 
the  mean  and  mode  is  therefore  a  function  of  the  skewness  or 
deviation  from  symmetry.  In  order  to  get  a  satisfactory 
measure,  the  way  the  material  is  grouped  must  be 
taken  into  account,  and  this  leads  us  to  measure  skewness 
by  (distance  between  mean  and  mode)  -5-  standard  deviation. 
If  the  mean  is  on  the  left-hand  side  of  the  mode  when  the 
statistics  are  plotted  out  in  diagram,  this  function  will  be 
negative,  and  to  remember  the  sign  it  is  convenient  to  write  : 

ri.  Mean  — Mode 

bkewness  =        — ^pr — 

The  diagram  will  help  to  show  the  rationale  of  the  measure 
for  skewness.  It  gives  two  curves  having  the  same  mean  B 
and  the  same  mode  A,  but  with  different  standard  deviations, 
and  it  is  clear  that  the  dotted  curve,  with  its  larger  standard 
deviation,  is  more  nearly  symmetrical  than  the  other  curve. 

11.  We  may  summarise  these  functions  by  saying  that  the 
mean  and  mode  fix  the  position  of  the  curve  on  the  axis ;  the 
standard  deviation  shows  how  the  material  is  distributed 
about  the  mean,  and  the  skewness  shows  the  amount  of  the 
deviation  from  symmetry  exhibited  by  the  material. 

These  preliminary  definitions  will  be  sufficient  for  our 
present  purpose,  but  the  functions  defined  will  be  more  easily 
understood  when  their  actual  connection  with  the  practical 
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work  of  curve  fitting  has  been  studied.  A  student  working 
at  the  subject  for  the  first  time  should  plot  out  several 
distributions  on  cross-ruled  paper,  in  order  to  familiarise 
himself  with  their  nature  and  appearance.  He  should 
calculate  and  insert  the  means  in  the  diagrams,  but  should 
not  attempt  to  calculate  standard  deviations  until  he  knows 
something:  of  the  method  of  moments. 


A    B 


CHAPTER    III. 


Method    of   Moments. 

1.  Before  we  proceed  to  deal  with  suitable  forms  for  use  as 
frequency-curves,  it  will  be  well  to  see  if  some  method  of 
applying  them  to  statistical  examples  can  be  found,  for  it  is 
clearly  useless  to  suggest  a  curve  and  have  no  way  of  using  it. 
We  require,  therefore,  a  general  method  by  which  a  given 
formula  can  be  fitted  to  a  particular  statistical  experience, 
and  may  be  applied  to  any  expression  (for  instance, 
Makeham's  formula  for  the  force  of  mortality)  on  which  we 
may  have  decided  as  the  basis  of  graduation.  The  first  point 
to  be  noticed  in  searching  for  such  a  method  is  that  if  there 
are  n  constants  in  the  formula,  we  must  form  n  equations 
between  the  formula  and  the  statistics.  Thus,  if  we  have 
three  terms,  say,  y  =  20,  40  and  88,  when  x  =  l,  2,  and  3 
respectively,  and  wish  to  use  the  curve  y  =  a  +  bx-{-c%2  to 
describe  them,  we  can,  of  course,  find  values  of  a,  b  and  c  so 
that  each  item  is  exactly  reproduced  by  equating  as  follows  : — 

a+  b+  c  =  20 
a  +  2b  +  22c  =  40 
a+3b  +  32c=88 

But  if  we  have  a  fourth  term  ?/  =  96  when  x=4,  and  use 
the  values  of  a,  b  and  c  found  from  the  three  equations  just 
given,  we  should  find  that  when  x  =  4  y  =  164.  This  suggests 
that  when  there  are  more  terms  in  the  statistics  than  there 
are  constants,  the  equations  must  be  formed  by  using  all  the 
terms,  not  by  selecting  from  them.  The  graduating  curve 
will  not  necessarily  reproduce  exactly  any  of  the  observations 
but  will  run  evenly  through  the  roughnesses  of  the  observed 
facts  so  as  to  represent  their  general  trend. 
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2.  Let  a1}  a2,  aA  .  .  .  an  be  n  terms  to  be  graduated  ;  then,  if 
tlie  series  were  perfectly  smooth  and  followed  a  known  law, 
each  term  could  be  reproduced  exactly  by,  say,  bx,  b2,  b3  .  .  .  bn 
where  ax  =  b1}  a2=b2,  a3=b3  .  .  .  and  an  =  bn.  Now,  if  we 
consider  the  two  series  (the  a's  and  the  b's),  we  see  that 
since  each  term  is  reproduced  exactly 

r=n  r—n 

r=l    .  r=l 

and  S  crar=2  crbr 

T=l  r=\ 

where  cr  is  a  numerical  coefficient. 

This  suggests  a  possible  method  to  apply  when  each  term 
cannot  be  reproduced  exactly.  The  total  of  the  graduated 
figures  must  be  made  equal  to  the  total  of  the  ungraduated, 
and  the  further  equations  necessary  for  finding  the  unknown 
constants  must  be  formed  by  multiplying  the  various  terms 
by  different  factors  and  similarly  equating  the  sums  of  the 
graduated  and  ungraduated  products,  i.e.,  Scrar  =  ltcrbr.  It 
still  remains  to  decide  the  best  form  to  be  given  to  cr, 
and    the    mean    being    equal  to 

a1  +  2a2+  •  •  •  +nan 

suggests  that  cr=r  should  give  one  reasonable  equation. 
Again,  since  we  shall  have  to  use  some  function  of  r 
which,  when  applied  to  the  graduation  formula,  will  give 
an  integrable  form  (otherwise,  we  cannot  make  an 
equation  between  ^.crbr  and  Xcrar),  the  powers  of  r 
suggest  themselves  as  convenient  when  integration  by  parts 
is  attempted.  If,  therefore,  we  write  Cr  —  r1  and  give  t 
successively  the  values  0,  1,  2  .  .  .  we  can  obtain  as  many 
equations  as  we  require,  and  the  first  two  of  them  give  the 
area  and  mean  of  the  distribution,  which  will  be  the  same  in 
the  graduated  and  ungraduated  figures. 

This  method  is  known  as  the  Method  of  Moments, 
(cf.,  moments  of  inertia),  and  Professor  Karl  Pearson  has 
recently  shown  (Biometrika,  vol.  i.,  pp.  267,  &c),  that  it  can 
be  expected  to  give  very  good  results. 

3.  Applying  the  method  to  solve  the  three  equations  given 
above,  we  have 
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(ft  +  6  +  c)  +  (ft  +  2b  +  22c)  +  (tf  +  3fr  +  32c)=20  +  40  +  88 
(a  jlJj  +  c)  +2(a  +  26  +  22c)  +  3(a  +  36  +  32c)  =20  +  2x40  +  3x88 
(a  +  6  +  c)  +22(a  +  26  +  22c)  +  32(«  +  37,  +  32c)  =20  +  23x  40  +  32  x  88' 

or  3ft  +   66  +  14c  =  148 

6ft+146  +  36c  =  364 

14a+366+98c=972 

These  equations  will  give  the  same  result  as  those  from  which 

they  were  formed,  because   each  of  the  three   terms  can  be 

graduated   exactly ;    but  if    we   introduce    the    fourth   term, 

x  =  4,  y  =  96,  we  can  modify  the  moment  method  by  adding  a 

fourth  term  to  each  equation  given  above  and  obtain 

4ft +106  + 30c       =     244 

10a+30&  +  100c  =    748 

30ft  +  1006  +  354c  =  2,508 

The  solution  of  these  equations  gives 

a—  —  25*5 
b  =     42-6 
c=-3 
or  03=1  y  =  14:'l 

x  =  2  t/  =  47'7 

a?=3  y  =  75-3 

x  =  4  y  =  96'9 

This  is  a  very  simple  example,  but  it  will  probably  help 
to  show  the  way  results  are  reached,  and  will  serve  as  a 
foundation  for  what  follows. 

4.  The  ?ith  moment  of  a  particular  frequency  is  defined  as 
the  product  of  the  frequency  and'  the  nth.  power  of  the 
distance  of  the  frequency  from  the  vertical  about  which 
moments  are  being  taken ;  or  the  nth.  moment  of  any  ordinate 
y  of  a  frequency-curve  about  the  vertical  through  a  point 
distance  x  from  it  is  yxn,  and  the  nth  moment  of  the  whole 
distribution  treated  as  a  series  of  ordinates  is  y^1  +  y2a'2n  +  •  •  • 
where  yl  +  y.2+  .  .  .  is  the  total  frequency.  Thus,  in 
Example  IV.,  the  third  moment  of  the  frequency  81  about 
the  vertical  through  age  77  is  81  x  (  —  2)3  where  5  years  is 
the  unit  distance. 

5.  If  the  ordinates  are  known,  we  can  calculate  the  moment 
for  them  immediately  by  multiplying  the  frequencies  by  the 
powers  of  the  distances  between  them  and  the.  vertical  about 
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which  the  moments  are  required  and  then  add  the  results, 
care  being  taken  to  give  the  distances  their  proper  signs.  If 
areas  are  given,  an  approximation  is  made  by  assuming  them 
to  be  concentrated  about  the  ordinates  at  the  middle  points  of 
the  bases  on  which  they  stand — the  moments  thus  obtained 
are  sometimes  said  to  be  based  on  "  loaded  ordinates."  The 
columns  after  the  third  in  Table  II.  show  the  calculation  of 
the  moments  about  the  vertical  through  age  77  for 
Example  IV.  of  Table  I.,  on  the  assumption  that  the  frequencies 
are  concentrated  at  the  middle  points  of  the  bases. 

Table  II. 


Central 

Age  of 

Group 

X 

Frequency 

a? -77 

5                /** 
=  s 

/x.s- 

/x*3 

fxs* 

(1) 

(2) 

(3)                    (4) 

(5) 

(6) 

(7) 
7,424 

57 

29 

-4 

116 

464 

1,856 

62 

23 

-3 

69 

207 

621 

1,863 

67 

81 

-2 

162 

324 

648 

1,296 

72 

151 

-1 

151 

151 

151 

151 

77 
82 

192 

239 

0 

1 

-498 

239 

-3,276 
239 

239 

239 

87 

157 

2 

314 

628 

1,256 

2,512 

92 

93 

3 

279 

837 

2,511 

7,533 

97 

29 

4 

116 

464 

1,856 

7,421 

102 
Totals 

6 

5 

30 

150 

750 

3,750 

1,000 

+  978 
+  480 

3,464 

+  6,612 
+  3,336 

32,192 

Notation  toe  Moments. 

N"  =  total  frequency. 

i 

>n  =  nth.  unadjusted  statistical  moment  about  menu. 

V 

'n=nth  unadjusted  statistical  moment  about  any  otber 
(.H  =  »tk  moment  from  curve  about  mean. 
=  wth  adjusted  statistical  moment  about  mean. 

point. 

H- 

n=nth  moment  from  curve  about  other  point. 
=  »th  adjusted  statistical  moment  about  other  point. 

No 

TE. — -t/,  v',  fj.  and  fx  always  refer  to  a  total  frequency  of 

unity. 

The  unit  of  grouping  has  been  taken  as  5  years,  and  if, 
as  is  often  convenient,  we  assume  the  total  frequency  to 
be  unity,  the  totals  will  have  to  be  divided  by  1,000. 
We  should  generally  deal  with  the  actual  numbers  that 
occur,    but .  as    they    have    been    given    in    Table    I.  as    the 
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distribution    of    1,000    cases,  it  will   be    better   to   use  them 

in  that  way  in  the  present  case.     The  numbers  —  4,  —  3  .  .  . 

in    column    (3)    show    the    distances    from    age    77    in    terms 

of  the  unit  of  grouping.      The   centre    of    any   other    group 

would  have  done  almost  as  well  as  77  ;    it   is  convenient  to 

choose  the  arbitrary  origin  so  that  it  is  near  the  mean  of  the 

distribution.       This    makes    easier    the    calculation    of    the 

moments  about  the  mean  (a  result  frequently  required),  and 

enables  the  calculator  to  get  a  rough  check  on  these  moments 

by  comparing  them  with  those  about  the    arbitrary  origin. 

The   columns    (4)   to   (7)  are    sufficiently    explained   by  their 

headings ;    they    are    formed    successively    and    checked    by 

multiplying  /  by  s4,  the  values  of  s*  being  taken  from  a  table 

of  the  powers  of  the  natural  numbers. 

6.   It  has  so  far  been  assumed  that  moments  can  be  calculated 

about  any  point,  but  it  is   frequently  inconvenient  to  do  so  ; 

for  if  we    had    required    them    about    age    79'4,    we    should 

i '  i  i  •  v  i  i.  ■!  4=  77-79-4  ,  82-79-4 
have  had  to  multiply  by  the  powers  or  —  =         ,  ot  ^ — 

and  so  on,  and  it  is  quite  clear  that  the  labour  would  have 
been  very  great.  In  such  a  case  we  can,  however,  take  the 
moments  about  any  other  more  convenient  point,  and  then 
modify  them  in  the  following  way  : — 

Let  the  distance  between  A,  about  which  the  moments  are 
known,  and  B,  about  which  they  are  required,  be  +  d ;  thus, 
if  we  want  moments  about  25"  7  and  have  found  them  about 
25,  d  is  '7  ;  if  we  had  found  them  about  26,  cl  would  have 
been  —'3. 

Then,  if  the  distance  of  any  ordinate  yr  from  A  is  Xr,  and 
from  B  is  xr,  then 

xr  =Xp— d 

and  xrn=(Xr— d)n. 

Now,  the  ?ith  moment  of  the  whole  distribution  treated  as 
a  series  of  ordinates  is  %y}Xrn  about  A,  and  Zyrd'rn  about  B ; 
so  we  have 

v"n  =  Zyrxr»  =  Zyr{Xr-dy> 

=  S\_yr(Xrn-ndXr»-l+  .  .  .  +^Ij»d*)] 

,           n{n—l)  ,    ,  n 

—  vn  —  ndvn_x-\ 2"j —  a2v»_2—    ■     ■     (1) 

c 
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where  v" n  is  written  for  the  nth.  moment  about  B,  and  v  n  the 
?ith  moment  about  A. 

Instead  of  (1)  we  may  proceed  as  follows  : — 


=  v"n  +  ndv"n_1+n(n0}1)dV'n_ 


4-  . 


v\l  =  v\l-ndv\^-ni^^d^n_2-    ...      (2) 

There  is  little  to  choose  between  these  two  formula?,  and  of 
course  they  give  identical  results. 

7.  We  will  now  apply  formula  (2)  to  work  out  the  moments 
about  the  centroid  vertical  {i.e.,  vertical  through  the  mean) 
for  the  example  in  Table  II.  The  distance  of  the  mean  from 
any  point  is 

S(X,.?/,.)  _  2(X,.y,) 

%yr    ~      N 

where  N  is  the  total  frequency;  or  we  may  say  that  the 
distance  of  the  mean  from  any  point  is  the  first  moment  of 
the  distribution  about  the  vertical  through  that  point.  It 
follows  that  the  first  moment  about  the  centroid  vertical  is 
zero,  and  this  leads  me  to  prefer  formula  (2)  to  formula  (1) 
when  moments  are  required  about  that  vertical.  When  we 
come  to  deal  with  frequency-curves,  we  shall  see  that  this  is 
generally  the  case. 

8.  The  arithmetical  work  is  as  follows  : — 

The  totals  in  cols.  (4)  to  (7)  are  divided  by  the  number  of 
observations  (total  of  col.  (2)  ),  and  the  quotients  are  the 
moments  [v)  about  77.  The  moments  are  dealt  with  as  having 
reference  to  a  case  where  unity  is  the  total  frequency, 
i.e.,  proportional,  not  actual,  frequencies  are  dealt  with. 

v\=   -480  v',=   3-464 

i/3=3-336  i/4  =  32-192 

The  value  of  v\  gives  the  mean  age  =  77 +  5  x  '480  =  79*4. 
In  order  to  use  formula  (lj  or  (2),  the  value  of  d  is  required, 
and  when  the  calculation  of  moments  has  to  be  made  about 
the  centroid  vertical  its  value  is,  as  we  have  seen  above,  the 
same  as  v\ ;  in  the  j)resent  case  it  is  the  first  moment  about 
the  vertical  through  age  77.  The  powers  of  d  are  next 
calculated  by  logarithms ;    as  it  hap23ens  d  is  a  comparatively 
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simple  number  ;  if  it    had    been  '48327 ,  say,  the  propriety  of 
using  logarithms  would  have  been  more  obvious — 

d2=-2304  (&=  +  '110592  d4='0530842. 

In  modifying  formula  (2)  for  the  particular  case  in  which 
moments  about  the  centroid  vertical  are  required,  it 
should  be  remembered  that  v\  is  zero  and  v0  is  unity,  because 
it  is  merely  the  total  frequency  divided  by  the  total 
frequencv. 

'  v,  =  v',-cl2  =      3-2336 

v,  =  v'3-3dv2-d3  =  -1-430976 

v4  =  v\-4!dv3-Qd2v2-d4=   30-416289. 

A  seven-place  logarithm  table  and  antilogarithm  table 
(such  as  Filipowski's),  an  Arithmometer  or  Brunsviga  should 
be  used.  It  will  be  noticed  that  6d2v2  can  be  formed  very 
easily  from  Sdv.2  when  logarithms  are  used,  as  log  d  is  known. 
It  is  useful  to  keep  a  note  of  this  value  (log  d)  in  a  conspicuous 
place  when  the  moments  are  being  calculated. 
9.  Although  the  above  is  the  most  direct  and  obvious  way 
by  which  moments  can  be  calculated,  another  method  was 
suggested  by  Mr.  Gr.  F.  Hardy  and  used  by  him  in  his  recent 
graduation  of  the  British  Offices  Life  Tables.  He  pointed  out 
that  by  summing-  the  statistical  numbers  and  forming  a  new 
series  in  the  same  Avay  as  the  ~NX  column  is  formed  from  the 
Dj.  column  and  then  summing  these  results  (cf.  the  i  column), 
and  so  on,  equations  can  be  formed.  So  far  as  I  can  trace, 
Mr.  Hardy  has  not  shown  the  connection  between  the 
summation  method  and  the  direct  calculation  of  the  moments, 
though  he  has  pointed  out  that  the  same  results  can  be 
obtained. 

The  arrangement  on  p.  20  shows  both  the  method  of 
calculation  and  the  form  of  the  expression  obtained  by  the 
process. 

Considering  the  line  opposite  the  first  term,  we  notice  that 
the  sum  of  the  series  is  given,  and  that  the  second  summation, 
which  we  will  call  S2  when  the  total  frequency  is  taken  as 
unity  gives  the  first  moment  of  the  whole  distribution  about  a 
vertical  situated  at  unit  distance  before  the  point  corresponding 
to  /(l).  Still  considering  only  the  first  line,  we  see  that 
S:<  gives  each  function  multiplied  by  coefficients  of  the  form 

c  2 
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or 


,  i.e.,  it  gives 


V-2  +  VX 


where  v   is  written  for 


2  j       —      2  !    '       '       °  2 

the  moment,  because  by  definition  the  2th  moment  (v'i)  of  the 

whole    distribution    is    given   by    the    sum    of    n*/(?i)    for   all 

values  of   n.      S4  and  S5   give  each  function   multiplied  by 

n?+3n2+2n        ,    ?i4  +  6?*3  +  ll>i'2  +  6?i  A.     . 

— h and    ^r respectively. 

The  following  equations  result : — 
S2  =  i>'i 

S4=i(^3+3i;/2  +  2i;/1) 

These  equations  enable  us  to  calculate  the  moments  about  the 
selected  origin,  but  if  it  is  necessary  to  find  moments  about 
the  mean,  the  following  relations  are  more  convenient ;  they 
can  be  reached  by  substituting  in  the  above  the  values  in 
formula  (2),  and  remembering  that  S2  =  d. 

v,  =  2$3-d(l  +  d) 

v3=6S4-3v£l+d)-d(l+d)(2+d) 

ir4=24S.— 2i*b{2(l  +  d)  +  l}— i^{6(1  +ci)(2  +  «0— 1} 

-d{l+d)(2  +  d)(3  +  d). 

10.  The  following  table  shows  the  working  in  the  numerical 
example  already  dealt  with  by  the  direct  method.  The  fifth 
sum  is  unnecessary,  as  the  total  of  the  items  in  the  fourth  sum 
gives  the  only  value  required  : — 

Table  IV. 


First 

Second 

Third 

Fourth 

1  requency 

Sum. 

Sum. 

Sum. 

_ 

19,372 

Sum. 

29 

1,000 

5,480 

54,508 

23 

971 

4,480 

13,892 

35,136 

81 

948 

3,509 

9,412 

21,244 

151 

867 

2,561 

5,903 

11,832 

192 

716 

1,694 

3,342 

5,929 

239 

524 

978 

1,648 

2,587 

157 

285 

454 

670 

939 

93 

128 

169 

216 

269 

29 

35 

41 

47 

53 

6 

6 

6 

6 

6 

,,  Tt*a*    Hooo 

(lor  check) ) 

5,480 

19,372 

54,508 

132,503 

■1-1 

From  the  totals  of  the  columns  we  have — 

S2  =  rf  =  5-48,  S3=19-372,  S4=54*508,  and  S5=132*503. 

The  first  value  S2  or  d  shows  that  the  mean  is  at  age 
52  +  5-48  x  5  =  79-4.  The  age  52  is  used  because  it  is  the 
centre  of  the  group  before  that  in  which  numbers  occur  and, 
as  has  been  already  remarked,  the  summation  method  assumes 
the  work  to  be  done  with  reference  to  this  position.  The 
application  of  the  formula  for  v2,  v3j  and  v4,  given  above, 
enables  us  to  find 

v2=        3-2336 

v3=-    1-43099 

p4=     30-4164 

11.  This  is  the  most  obvious  way  of  using  the  summation 
method,  but  if  the  series  contains  a  great  number  of  terms,  it 
is  more  convenient  to  use  a  central  term  instead  of  the  first 
term  as  the  starting  point  for  the  summation.*  A  slight 
adjustment  is  then  needed  because,  though  there  is  no  difficulty 
about  the  calculation  of  the  sum  for  the  terms  on  the  positive 
side  of  the  selected  point,  the  moments  for  the  terms  on  the 
negative  side  are  formed  by  multiplying  by  the  powers  of 
negative  quantities.     In  order  to  use  the  formula:  given  above, 

we  require  Suf(n);  %— — ^ — -/(»),  and  so  on,  or*  when  n  is 

negative^ -nf(-n); X  - — ^_ ;j(-n)ovl-      2     -f(-n); 

~  —  ra(  — n+l)(— n  +  2)  „  s  v  n(n  —  l){n  —  2)  ,,  N  , 
^ K- ^ -/(-»)  ul^~ jp ^/(-ra);and 

-n(-n+l)(-n  +  2)(-n  +  Z)  j,  _  ^n{n-l)(n-2)(n-S) 

24  A     n)  01  *  24  ^ 

The  first  of  these  is  given  by  the  last  term  in  the 
ordinary  second  summation  taken  negatively ;  the  second 
is  seen  from  Table  III.  to  come  from  the  term  before  the 
last  in  the  ordinary  third  sum ;  the  third  is  the  second 
term  before  the  last  in  the  fourth  sum  taken  negatively; 
and  the  fourth  is  the  third '.  term  before  the  last  in  the 
fifth    sum ;    the   sums    in    each    case    being    begun    from   the 

*  1  have  to  thank  Mr.  G.  J.  Lidstone  for  tke  suggestion  of  this  improvement 

in  the  method. 
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central  term  but  in  the  reverse  direction  from  the  sums  on  the 
positive  side.  To  make  the  method  clearer  the  following  table 
has  been  prepared,  showing  the  calculation  of  the  summation 
about  the  centre  of  the  group  of  which  the  frequency  is  192. 

Table  IV. (A). 


First 

Second 

Third 

Fourth 

Fifth 

Frequency. 

Sum. 
29 

Sum. 
29 

Sum. 

Sum. 

Sum. 

29 

29 

29 

29 

23 

52 

81 

110 

139 

8L 

133 

214 

324 

151 

281 

498 

192 
239 

52  t 

978 

1,648 

2,587 

3,854 

157 

285 

4-54 

670 

939 

93 

128 

169 

216 

269 

29 

35 

41 

47 

53 

6 

6 

6 

6 

6 

1,000 

and 

Hence 

and  similarly 

and 


S2=   -978-'498  =   -48 
S3=  1-648 +  -324=  1-972 
S4= 2-587-- -139  =  2-448 
S5= 3-854 +  -029  =  3-883 

v,  =  2  x  1-972--48  x  1-48  =  3-2336 

„3=_ 1-43097 

i,4  =  30-41621 
agreeing  with  the  previous  results. 

12.  A  comparison  of  Table  IV. (A)  with  Table  IV.  will  show 
that  a  saving  of  numerical  work  is  effected  by  using  a  central 
point  as  the  starting  point  for  the  summation,  for  the  sums 
are  numerically  smaller  and  the  value  of  S2  or  d,  which  enters 
into  the  formulae  on  p.  21,  is  much  smaller.  It  will  be  readily 
appreciated  that  whenever  there  is  a  large  number  of  terms 
the  summation  method,  and  especially  the  form  of  it  given  in 
Table  IV. (A),  is  a  very  great  improvement  on  the  product 
method  of  calculating  moments.  By  means  of  an  adding 
marline,  such  as  Burroughes5  adding  machine,  the  summations 
can  be  obtained  mechanically  with  little  trouble,  even  for 
series  containing  as  many  as  a  hundred  terms. 
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13.  It  is  now  necessary  to  consider  the  calculation  of  moments 
from  the  curve,  for  until  this  has  been  done  it  is  impossible  to 
form  equations  for  finding  the  constants. 

Let  yx=f(cV,  a,  b,c  .  .  .)  where  a,  h,  c  .  .  .  are  constants  to  be 
determined. 

We  have  seen,  on  pp.  13  and  14,  that  one  way  of  working 
would  be  to  find 

/(I,  a,  /,,  c  .  .  .)xl«+/(2;  a,  b,  c  .  .  .)  x2»+  .  .  . 

say,  S  f(x,  a,  b,  c  .  .  .)  x  xn, 

and  this  would  give  a  result  which  might  be  used  in  forming 
equations  if  it  were  not  for  the  fact  that  it  is  often  impossible 
to  find  an  algebraic  expression  for  the  sum  of  such  a 
series  in  terms  of  the  constants.  It  is,  however,  generally 
possible  to  find  such  an  expression  for  the  integral,  and  as  we 
have  defined  the  ?<th  moment  of  an  ordinate  yx  as  yxxn,  the 
nt\\  moment  of  the  whole  distribution  from  x=h  to  x=h  is 

yxxndx  or         f(x,  a,  b,  c  .  .  .)xtldx. 

J  h  J  h 

The  total  frequency  {i.e.,  total  number  of  cases  investigated) 

rk  m       Ck  rk 

is        yxdx,   and    the    mean    is        ygxdx-r-      yxdx,   as   we    have 

J  h  J  h '  J  h 

already  noticed. 

14.   If    the    moments    from    the    equation    to    the    curve    are 
calculated  in  this  way  and  equated  to  the  moments  calculated 
from  statistics  by  assuming  that  the  latter  consist  of  a  series 
of  ordinates,  an  inaccuracy  is  introduced. 
Let  us  consider  the  two  cases  : — 

(1)  When  the  statistics  are  a  system  of  isolated  terms 

or   ordinates*  and   we   wish  to  pass  a   curve  very 
closely  through  them. 

(2)  When  they  are  a  system  of  areas  but  the  moments 

are    calculated    by    assuming     the    areas     to     be 
concentrated  at  the  middle  points  of  the  bases. 

*  Strictly   speaking,   not    a   frequency   distribution  but  a  series  of  values 

requiring  graduation.  The  distributions  referred  to  on  p.  5  have  to  be  dealt 
with  as  areas  for  frequency-curve  work  because  they  tell  the  way  the  whole 
number  of  cases  is  divided  in  groups,  and  the  whole  area  between  the  curve 
and  the  axis  of  x  must  therefore  be  used. 
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15.  (1)  In  this  case  the  terms  ?/0,  yl}  y2  .  .  .  yn-\  ^i*e  given  by 
the  statistics,  and  since      yxdoo  is  approximately  equal  to  yQ,  it 

is  simplest*  to  assume  that      yxdx  is  given  by  the  equation  to 

J-* 
the  curve,  and  we  have  to  find  adjustments  to  counteract  the 

x=n— 1  Cn— £ 

error  caused   by  equating   2  X^.   to      X.yxdx   (the    error    is 

.7=0  J    -i 

analogous  to  that  introduced  by  assuming  (l+t)*o»;  =  5^). 
The  most  practical  way  of  overcoming  the  difficulty  is  by 
calculating  the  true  area  corresponding  to  the  ordinates 
y0f  y\  .  .  .  yn-\  by  means  of  a  quadrature  formula  (formula  of 
approximate  summation).  Many  formulas  are  well  known, 
and  some  have  been  given  for  use  in  rather  special  circum- 
stances in  Text-Book,  Part  II.,  pp.  480-491 ;  but  for  the 
present  purpose  it  is  convenient  to  have  expressions 
which  give  approximate  values  of  an  area  in  terms  of 
ordinates  lying  both  within  and  without  the  base  on 
which    the    area    to    be  valued    stands.     Symbolically,   these 

fi 
formulas   express       yxdx   in  terms  of   i/_i,   ijl,  y__n,   y^,  &c, 


or  y<>>  yu  y-u  y*>  y-2,  &c. 

I.— Let 

yx  =  a  +  bx  +  cm2  +  ds3  +  ex*, 
then 

and 

y0  =  a 

y-i+yi=2(a+c+e) 

7/_2  +  2/2=2(a  +  4c+16e). 
Now,  assume  the  required  integral  can  be  equated  to 

hyo  +  h(y_1  +  yJ)  +  l(y-2+y2), 

substitute    the    values    given    just    above    and    equate    the 
coefficients  of  a,  c  and  e  respectively  to  1,  y^   and        ,  and 

*  It  is  generally  possible  to  use  these  limits  in  case  (1),  but  if  other  limits 
have  to  be  taken,  such  as  0  to  n,  different  quadrature  formulae  must  be  used. 
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we  have  h  +  2k  +  21=1 

The  solution  of  tliese  equations  gives 


7  _5178  308  17 

*  ~  5760 '  k  ~  5760 >  anCl  ' ~  "  5760 ' 


and  we  obtain 


f  *  1 

I      y*dM=  r^Q0{ol78ij0  +  3Q8{y_1  +  ijl)  ^17(y_2i- y2)\. 

II.— If 

r*         i  ^ 

III.— If 

yx  =  a  +  bx  +  ex?  +  iIm3  +  e.c4 

IV.— If 

16.   We  can  now  take  the  calculation  of  the  moments,  where 

Vjcdx  is  required  in  terms  of  y0,  yx  .  .  .  v»»i  ■ 
J-* 

Now, 

yxdx=\     yxdx+\     yxdx+...+  \       yxdx. 

If  formula  I.  be  applied  it  can  be  used  for  all  the  integrals 
on  the  right-hand   side   of  this   equation  except  the  first  two 


and  the  last  two,  and  the  values  of  these  are  given  by  IV. 
Summing  the  values  obtained  and  writing  IV.  with  the 
denominator  5760,  we  obtain 

[V^=^{W63y,+4871y1  +  6660yi+5537y,  +  5760(y4+yf+.  ■  . 

+  ijn-6  +  !Jn-o)  +  o537yw_4  +  6669t/M_3 

+  4371y»_2+6463yw_1J  .     .     . 

which  means  that  we  can  multiply  the  first  and  last  ordinates 
6463 
5760 

^Zl  (  =  -7588541),      the      third      and 
o760 v  J 

— —  (  =  1-1578127),    the     fourth    and     last     but    three 
o760  v  ; 

iV37 

(  =  •9612847),     leave 


by 
by 
by 

bv 


=  1-1220485),     the     second 


and     last     but     one 
last      but     two 


all      the      other      ordinates 
5760 

unaltered,    and    work    out    the    moments   in    the    usual   way 

from  this   modified  series  of  ordinates.      Of  course,  if  there 

are    less    than    eight    ordinates     another    formula     must    be 

evolved. 

17.  In  the  following  table  the  original  series  and  the  modified 

one   are   set  out  in  the  first   two  columns,  and  in   the    other 

columns  the  calculations  of  the  first  four   moments  about   the 

middle  of  the  range  by  the  direct  method  are  shown : — 


Table  V. 


y* 

Modified  by 
Formula  V. 

y'x 

y ',;  x  x 

y'%  x  ff2 

y'x  x  #3 

y'x  x  & 

51-81 

58-13 

232-52 

930-08 

3,720-32 

14,881-28 

43-74 

33-19 

99-57 

288-71 

866-13 

2,598-39 

35-58 

41-01 

82-02 

164-04 

328-08 

656-16 

27-80 

26-72 

26-72 

26-72 

26-72 

26-72 

20-42 

20-42 

-440-83 

-4,941-25 

13-79 

13-26 

1326 

13-26 

13-26 

1326 

8-26 

9-52 

19-04 

38-08 

76-16 

152-32 

4-29 

326 

9-78 

29-34 

88-02 

264-06 

1-69 

1-90 

7-60 

30-40 
1,520-63 

121-60 

486-40 

208-38 

207-41 

+    49-68 
-39115 

+     299-04 
-  4,64221 

19,078-59 
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207 "41   is  then  treated   as   the  total   frequency,   and    the 

moments    for    unit    frequency    (fjun)    would    be    obtained   by 

dividing  -391-15,  1520-63,  &c,  by  207*41,  and  not  by  208'38, 

which  is    not   the   "  total  frequency ",  but    merely   gives  the 

uncorrected  sum  of  certain  equidistant  values. 

18.   The  work  can  sometimes  be  simplified  considerably,  for  if 

the  values  at  the  ends  of  the   experience  are  very  small   and 

have  a  tendency  to  keep   close   to  the  axis  of  x  before   they 

finally  vanish   (i.e.,  if  there  is  high   contact ;    most  actuarial 

functions  lx,  ax,  Dx,  &c,  have  high  contact   at  the   old  age 

end  of    the    table),    then    it  is    reasonable    to  suppose    that 

ordinates    before    the    first    and    after    the    last    exist,    but 

are  insignificant  in  value.     Thus  the  integral  corresponding 

to  the  whole  series  of  ordinates  can  be  legitimately  extended 

beyond  the  limits  —  \  and  n—\  previously  used,  because  the 

additional  area  thus  introduced  will  be  evanescent.      Now  if 

the  area  be  so  extended,  the  effect  will  be  that  in  equation  V 

the  significant  ordinates   from   y0  to  yn_x  will   all   have   the 

coefficient  unity,  and  the  ordinates  with  weighted  coefficients 

will  all  vanish.      The  practical  result  is,  that  if  there  is  high 

contact  at  one  end  of  the  statistics  the  adjustment  need  only 

be  made  at  the  other  end,  while  if  there  is  high  contact  at 

both  ends  no  adjustment  is  necessary.     Mathematically,  high 

contact  means  that  the  first  few  differential  coefficients  vanish 

at  the  point  of  contact.    The  diagrams  on  pp.  73  and  90  show 

high  contact  at  both  ends  of  the  curves,  and  the  diagram  on 

p.  67  shows  high  contact  at  the  longer  durations. 

19.  (2)   The  second  case,  namely,  that  in  which  mid-ordinates 

are    used   instead    of    areas,    may    now   be    examined.       By 

concentrating     areas     about     the    middle    points    of    their 

bases,    we  assume    that   the    distances    by  which    the    areas 

ft  f1* 

yxdx ;         ijxds,  &c,  must  be  multiplied,  are  the   same  as 

the  distances  from  y0  y1}  &c;  that  is,  the  tth.  moment  from  the 
statistics  is 

\+hyxdxXH  f11  y„dx{X+iy+  .  .  .  +\"~JyxchiX+n-iy 

and  we  require  (X  +  a)  f yxdx,  where  X  is  the   distance  of 

y0  from  the  ordinate    about  which  moments   are    calculated. 
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Bv  formula  I.  the  series  of  integrals  can  be  written 

-L{  .  .  .  +  [5178fc'  +  308{(fc-l)'  +  (A  +  l)'} 

-17{(h-2Y+(h+2y\}y+  ...} 

where  h  is  written  for  X  +  oj  in  order  to  simplify  the  expression, 
and  working  out  this  general  coefficient  Ave  have 

If  t  =  l  this  becomes  h 

t  —  9  h2-\-  1 

>>     l 99  }i  ,L       '     1  2 

„  t  =  S     „  „         /i3  +  J/t 

„     *  =  4        „  „  /l4+i/l2+_l_r 

It  has  already  been  noticed  that  if  there  is  high  contact,  the 
value  of  {X.  +  xyydx  is  found   by   using   the    unadjusted 

ordinates ;  that  is,  the  second  moment  is  given  by  a  series, 
the  general  term  of  which  is  h2y ;  the  third  by  a  series,  the 
general  term  of  which  is  h3y ,  and  so  on;  hence,  if  jjl  be 
written  for  the  true  adjusted  moment  about  the  mean  and  v 
for  the  unadjusted  moment,  the  relations  between  /x  and  v  are 
given  by 

/jL-2  +  t\  =  Vo  or  fi2=v2—Yw 

The  mean  needs  no  adjustment,  for  if  ^  =  1  the  general  term 
has  the  correct  coefficient  h,  and  the  third  moment  has  to  be 
adjusted  by  J  of  the  first  moment,  which  is  zero  where  the 
moments  are  taken  about  the  mean.  These  adjustments  were 
first  given  by  Mr.  W.  F.  Sheppard  in  Proceedings  of  the 
London  Mathematical  Society,  vol.  xxix.,  pp.  353-380.  In  order 
to  demonstrate  the  correction  for  the  ?ith  moment  by  the  above 
method,  a  parabola  of  at  least  the  nth.  order  is  necessary.  If  we 
apply  these  adjustments  to  the  moments  found  on  p.  19,  for 
Example  IV.  of  Table  L,  we  have  /x,  =  3-1503,  /x3  =  - 1-430976, 
and  ^4  =  28*828322.  These  adjustments  are  found  to  make  a 
considerable  difference  in  the  constants  obtained  from  the 
moments  especially  when  there  is  a  small  number  of  terms. 
20.  When  there  is  not  high  contact  at  both  ends  of  the  curve, 
the  adjustments  become  more  difficult  to  value ;  suggestions 
have  been  made  for  finding  the  corrections,  but  they  are  not 
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altogether  satisfactory,  and  it  is  probably,  best  in  such  cases 
to  use  the  unadj listed  figures.  A  fe^jx^particular  cases  are, 
however,  dealt  with  in  ATr*r4^-4rr2T  of  Chap.  V. 

A  student  should  calculate  the  moments  for  one  or  two 
distributions,  and  make  the  necessary  adjustments ;  he  can 
also  find  the  standard  deviations  of  distributions,  for  the 
SD=  \/yL62  where  the  /u,2  has  been  adjusted  in  accordance  with 
the  above  rules.  In  Examples  III.  and  IV.  there  is  clearly  high 
contact,  in  II.  and  V.  there  is  more  doubt  but  the  adjustment 
is  advisable,  while  in  I.  the  rough  moment  should  be  used. 
21.  Before  proceeding  to  deal  with  fitting  more  complicated 
curves  it  is  advisable  to  consider  the  application  of  the  method 
of  moments  to  a  simple  case,  namely,  when  y  —  a  +  bx  +  c.r2 -f  &c. 

Let  the  range  be  21,  and  let  the  origin  be  at  the  middle 
point  of  the  range,  and  m0  stand  for  the  area  and  mn  for  the  nth 
moment  of  the  whole  distribution  about  the  middle  of  the  range. 


Then 


w2s  =       (a  +  bx  +  ex2  +  .  .  .)x*d& 


0    XI    +  O^Q  + 

,2s  +1      2s  -|-  3 


and  similarly  m2,+i=21  x  l2s+ 


bl 


-n  + 


2s+3  '   2s+5 


+ 


These  equations  show  that  the  even  moments  give  the 
constants  a,  c,  e,  &c,  and  the  odd  moments  give  the  constants 
b,  d,f,  &c.  This  is,  of  course,  the  result  of  using  moments 
about  the  middle  of  the  range,  and  makes  the  solution  of  the 
equations  less  laborious  than  they  would  otherwise  have  been. 
The  solution  can  also  be  simplified  a  little  by  writing 


•ll'  I'2S  ^~2s+  L   '   2s  +  3 


a  cl2 

7+  s— ;  «+  ••  • 


so  that 


1  cl2      p/4 

—  .m0  =  flf+  -TV  +  y  +  •  •  . 

1     m2  _  a       cl2      el4 
2  Z '  I2  ~S+  5  +   7  +-" 

1     m4  _  a      cl2      el4 
2  1 '  I4  ~  5  +  7   +   9  +  '  •  ' 
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II. 


and  similarly 

1     m,      bl   ,  cZZ3  ,  fP    , 

.  .  =  -     4-  4-4- 

2Z     Z        3        o  ^   7    ^ 

1     m3_bl       dls       fP 
2i"  Z3  ~  5        7   +  *9  + 

1     m5  _bl      dP     fl9 
21    Z5  ~~  7  +   9  +11  + 


The    solution    of    these    equations    gives    the    constants 
required,  for  example — 

(i.)  if  7/  =  a  4-  fea?j  we  have — 


a  =  2l  m° 


Z>  = 


3    1       W ! 

I  2Z'T 


(ii.) 


if  y  =  a  4-  6,r  4-  ex? 

_3(3  5    mal 

a-42rm°~2T  "Pi 


6  = 


3       1       771, 

7'2Z"   / 
15  f       1 


(iii.) 


3    wiJ 
C_4Z2\      2Zm°+2Z>7 

if  y  —  a-\-  bx  +  ex2  4-  cZa;3 

3(3  5    m.j  I 

,  _  15  J  5    m,       7    ra3| 
4l\2l*   Z  """2l'"FJ 

15/      1  3    m,l 

C_4Z2r  2Z'mo+2l'Tj 

35  f       3    m,   ,    5    w3) 

2Z*  Z3J 


3    m, 

4Z3[      2/'  Z 


The  above  results,  which  can  easily  be  extended  if  it 
is  wished,  may  now  be  applied  to  one  or  two  numerical 
examples. 

22.  As   a   first  example,  we  shall   graduate   the  statistics   in 
Table  V.,  Art.  17,  for  which  the  moments  about  the  middle 
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of  the  range  have  been  calculated.  Taking  the  curve 
y  =  a  +  h,r  +  cx- ,  the  following  values  from  Table  V.  will  be 
required  : 

21  =  9         or         Z  =  4-5 

m0=        207*41 

W!=—   391-15 

m,=     1520-63 

„  3  f 622-23      5      1520-63) 

Hence  a=  j  (-g-  -  g  X  -^  } 

=  20-563 

3        1       (-391-15) 


4-5      9  4-5 

-  -6-4387 

15      f  _  207-41      3      1520-63  \ 
C_4(4-5)U  9      +  9  X     (4-5)*   J 

=  •36815 

23.  The  best  way  to  obtain  the  ordinates  corresponding  to  this 
graduation  is  by  calculating  6  +  c  the  first  difference,  and  2e 
the  second  difference,  from  the  middle  term ;  their  values  are 
—  6-0706  and  -7363  respectively.  Since  second  differences 
are  constant,  the  work  is  done  continuously,  and  is  as 
follows : 

A  A2 

52-208  -9-016  -736 

43-192  -8*279 

34-913  -7-543 

27-370  -6-807 

20-563  -6-071 

14-492  -5-335 

9-157  -4-599 

4-558  -3-862 

•696 

These  graduated  figures  will  be  found  to  agree  fairly  well 
with  those  given  in  the  first  column  of  Table  V. 
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24.  As  a  further  example  the  following  statistics,  taken  from 
a  paper  by  S.  H.  J.  W.  Allin  (Journal  of  the  Institute  of 
Actuaries,  xxxix.,  p.  350),  and  giving  the  values  of  annuities 
to  widows  in  pension  funds  according  to  the  age  of  the  member, 
may  be  considered  : — 


Age. 

Value 

of 

Annuity. 

Modified 

by 

Formula  V. 

p.  2*7 

a' 

Distance 

from 
middle  of 

range 

multiplied 

by  2. 

d 

of  x  d 

a'  x  d- 

«'  x  d3 

27 
32 
37 
42 

47 
52 
57 
62 

21-20 
19-91 
19-34 

18-58 

16-74 
1569 
1470 
1299 

23-79 
15-11 
2240 
17-86 

16-09 
18-17 
11-15 
14-58 

-7 
—  5 
-3 
-1 

+  1 
+  3 
+  5 

+  7 

16653 
75'55 
67-20 
17-86 

1165-71 

377-75 

201-60 

17-86 

1609 
163-53 

278-75 
714-42 

8159-97 

1888-75 

604-80 

17-86 

-10671-38 

-327-14 

1609 
54-51 
55-75 

102-06 

16-09 

490-59 

1393-75 

5000-94 

139-15 

+  228-41 

2935-71 

+  6901-37 

-  98-73 

-3770-01 

In  calculating  the  above  moments  it  has  been  assumed  that 
the  figures  to  be  graduated  represent  a  system  of  ordinates  ; 
if  they  had  represented  a  system  of  areas  the  adjustment  by 
formula  V.  would  have  been  unsuitable. 

When  there  is  an  even  number  of  terms  the  difficulty  of 
calculating  the  moments  about  the  middle  of  the  range  is  that 
the  terms  have  to  be  multiplied  by  -5,  To,  2*5,  &c,  and 
if  the  series  to  be  graduated  contains  only  a  few  terms,  it  is 
best  to  deal  with  the  distance  d,  in  the  way  shown  above,  and 
then  divide  the  totals  by  2,  4  and  8,  in  order  to  obtain  the 
first,  second  and  third  moments  respectively.  In  this  way, 
we  have 

1=         4 


7W0  = 

139-15 

7)1]  = 

-  49-36 

W*2  = 

733-93 

m3  = 

-471-25 
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We  will  now  fit  the  statistics  with  eacli  of  the  three 
curves,  the  formulae  for  which  have  been  given,  and  compare 
the  resulting  graduations. 

(i.)  y=  17*394- M57a; 
(ii.)  y= 17-633- l-157aj-*0451»2 
(iii.)  y=17-633-ri9Qaj--0451aj?+-0035a;3 

The  following  table  shows  the  graduations  : — 


Age. 

Ungraduated. 

(i.) 

(ii.) 

(iii.) 

27 

21-20 

21-44 

2113 

21-13 

32 

19-91 

20-29 

20-24 

20-28 

37 

19-34 

19-13 

19-27 

19-31 

42 

18-58 

17-97 

18-20 

18-22 

47 

1674 

16-82 

17-04 

17-02 

52 

15-69 

15-66 

15-80 

15-76 

57 

14-70 

14-50 

14-46 

14-43 

62 

12-99 

13-34 

1303 

13-05 

Formula?  (ii.)  and  (iii.)   are  practically  identical,  and  both 
are  considerably  closer  to  the  original  figures  than  (i.). 
25.  The    results    obtained    so    far    may    be    summarized    as 
follows : — 


(1)   The    method    of    moments 


is    a 


•eneral  method 


of  finding  the  constants  in  a  formula  suitable 
to  a  particular  statistical  example,  and  it  consists 
of  equating  the  values  of  2/(?j)  x  nf  (which 
is  called  the  tih  moment,  and  is  summed  for  all 
values  of  n  that  occur)  to  similar  expressions 
obtained  from  the  graduation  formula.  These  latter 
expressions  will  be  algebraic,  and  simultaneous 
equations  have  to  be  solved  in  order  to  find  the 
arithmetical  constants. 

(2)  The  moments  from  the  statistics  can  be  calculated 

by  multiplying  the  frequencies  by  appropriate 
values  of  nf ,  or  by  Mr.  G-.  F.  Hardy's  summation 
method. 

(3)  If    moments    have   been   obtained  about    any   one 

vertical,  they  can  be  transferred  to  any  other  by 
the  formulae  in  Art.  6  of  Chap.  III. 

(4)  Since   the  moments  from  the  graduation  formula 

must  generally  be  found  by  means  of  the  integral 
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calculus,  while  those  from  the  statistics  are  found 
by  summation,  the  latter  have  to  be  adjusted 
before  the  equations  for  obtaining  the  constants 
can  be  correctly  formed.  The  adjustments  depend 
on  whether  the  statistics  are  a  system  of  ordinates 
or  a  system  of  areas  ;  in  the  former  case  adjust- 
ment is  made  by  equation  V.,  and  in  the  latter  by 
the  formulae  in  Art.  19  (Sheppard's  adjustments),  if 
there  is  high  contact  at  both  ends  of  the  curve. 


D  2 
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CHAPTER   IV. 


Frequency-Curves. 

1.  When  it  becomes  necessary  in  practical  work  to  decide  on  a 
system  of  curves  for  describing  frequency  distributions,  we 
have  to  bear  in  mind  that 

(1)  Any  expression  used  must  be  a  graduation  formula; 

it  must  remove  the  roughness  of  the  material. 

(2)  There  must  not  be  so  many  constants  in  the  formula 

that  we  require  a  great  number  of  moments,  for 
this  means  that  the  accuracy  is  reduced.  The 
higher  the  moment  the  more  liable  it  is  to  error 
when  deduced  from  ungraduated  observations; 
this  is  clear,  when  we  remember  that  the  ends  of 
the  experiences  are  multiplied  by  the  highest 
numbers  and  their  powers. 

(3)  There  must  be  a  systematic  method  of  approaching 

frequency  distributions. 

2.  Now,  considering  the  more  obvious  characteristics  of 
frequency  distributions,  we  find  they  generally  start  at  zero, 
rise  to  a  maximum,  and  then  fall  sometimes  at  the  same  but 
often  at  a  different  rate.  At  the  ends  of  the  distribution 
there  is  often  high  contact.  This  means,  mathematically,  that 
a  series  of   equations  ij  =  f(x)  ;  y  =  <p(%),  &c,  must   be  chosen, 

so  that  in  each  equation  of  the  series  -r^  =  0  in  certain    cases  ; 

1  clc  ' 

at  the  maximum  (for  the  test  of  a  maximum  is  that  the  first 
differential  coefficient  is  zero  and  the  second  negative)  and 
when  y  =  0,  for  there  is  to  be  contact  at  one  end,  at  least, 
of  the  range  of  the  distribution,  or,  in  other  words,  the 
angle  formed  by  the  tangent  to  the  curve  at  this  point 
must   be    zero,    in    order    that    the    tangent    of    the    angle 
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(i.e.,  differential  coefficient)  may  be  zero.  In  non-geometrical 
lano-uao-e,  the  finite  difference  between  two  successive 
ordinates  must  be  zero,  or  tliere  will  not  be  contact. 

The    above    suggests    that    ■-    may    be    put    equal    to 

(L  X 

■f  J^f , — -,  then,  if  v  =  0,  ~  =0,  and  if  x=—  a,  -~-=0,  and 
F(x)  '  dx  dx 

we  have  the  maximum  we  require.  So  long  as  F(a')  is 
general  the  form  assumed  for  -- -  is   extremely  general   and 

includes  cases  when  -j*-  may  not  be  zero  when  y  is  zero.     F(x) 

is  expanded  by  Maclaurin's  theorem  in  ascending  powers  of 
X)  and  we  have 

di=  y(«  +  o)  j 

dx      b0  +  bxx  +  b2x2  +  .  .  . ' 

We  shall  return  to  this  equation  and  show  how  it  can  be  put 
in  the  form  y  =  f(x),  so  as  to  express  y  as  a  direct  function 
of  x  ;  but  as  the  matter  has  up  to  the  present  been  approached 
from  an  experimental  point  of  view,  it  will  be  interesting 
to  see  how  equation  I.  can  be  obtained  up  to  the  x2  term 
in  the  denominator  from  elementary  propositions  in  the 
theory  of  probabilities. 

3.  If  p  be  the  probability  of  an  event  happening  and  q 
the  probability  of  its  failing,  then  the  probabilities  of  its 
happening  once,  twice,  and  so  on  out  of  n  trials  are  given  by 
the  terms  of  the  expansion  (p  -\-q)n ;  or  if  we  have  N  cases, 
the  terms  of  N  (p  +  q)11  give  the  frequency  distribution  of  the 
N  cases  into  n  groups.  The  binomial  series  does  not  represent 
nearly  all  the  probabilities  that  arise,  and  another  series  that 
occurs  is  the  hypergeometrical.  Thus  the  chances  of  getting 
r,  r  — 1,  .  .  .  0  black  balls  from  a  bag  containing^  black  and 
qn  white  balls  when  r  balls  are  drawn,  are  given  by  the 
successive  terms  of  the  series 

pn(pn—  1)  .  .  .  (pn  —  r-f-1)  f^  rqn 

n(n  —  i)  .  .  .  (n—  r+1)      I        pn—r  +  1 

r.r  —  1  qn(qn-l) 1 

2  !     (pn-r  +  l){pn-r  +  2)  +  " '  \i 

A   numerical    example    may    help    to  make  the  way  the 
series  arises  clear.     A  bag  contains  seven  balls,  of  which  four 
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are  black  and  three  white ;   then  if  three  halls  are  drawn  the' 
probability  that 

all  will  be  black  is 


4.3.2 


7.6.5 

two  will  be  black  is         '  n '     x  3Cj 
7.6.5 

one  will  be  black  is      „W  x  3C2 


none  will  be  black  is 


7.6.5 

3.2.1 
7.6.5 


The  sum  of  these  four  expressions  is  unity.  The  terms  can 
be  seen  to  agree  with  the  series  by  putting  n  =  l ,  pn  =  4>, 
qn  =  S,  and  r=  3. 

Other  series  may  arise,  but  those  given  will  be  sufficient 
for  the  present  purpose,  and  we  shall  proceed  to  consider  how 
they  can  be  put  in  the  form  of  equation  I.  The  inconvenience 
of  the  expressions  as  they  now  stand  becomes  fairly  obvious 
when  an  attempt  is  made  to  calculate  numerical  values  for  a 
large  number  of  groups,  and  besides  this,  they  are  not 
continuous,  while  the  statistics  of  practical  work  often  are. 

Considering  the  hypergeometrical  series,  and  remembering 

ldy 

y 

the  series  is  discontinuous,  finite  differences  must  be  used,  we 
have 

pm{pn  —  1)  .  .  .  (pn  —  r  +  Y)   r(r  —  1)  .  .  .  (r— a?-f  2)  ^ 
?•-     n(w-l)  ..  .  (n-r+1)  (»-l)I 

qn(qn-l)  .  . ,.  (qn—x  +  2) 


that  the  f auction  required  for    equation   I.   is    -    *  ,  and  as 


{pn — r+l){pn— r  +  2)  .  .  .  (pn— r+x—  1) 


.  (r—x  +  1    qn  —  x+l       1) 

*».=v.n-u=v.\—i — pu_r+x-1i 

j(r  +  lHgn+l)-*(»+2)\  iovp  +  q  =  l 
Jx\  a,(pn—r  +  x)  )         1      1 

and 

Ay*  _  2{(r+l)(qn+l)-m{n+2)} 

y„+i  ~  \r+-L)(qn+l)-x{2(r+l)+n(q-p)}  +  2tf, 
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rhicli  may  be  put  in  the  form  of  equation  L, 


ldy 
ydx 


a  +  x 
b0  +  bLx  +  b.2,v- ' 


4.  Returning  to  equation  L,  we  see  that  it  can  be  written  in 
the  form 


{b0  +  bv.v  +  b,x-2  + 


•)^=y(»  +  a)i 


multiplying  each  side  by  xn,  and  integrating  with  respect  to  x3 
we  have 

)  xnQ)Q  +  bxx  +  b.2a-  +  .  .  .)  ■£■  dx  =  j  y  [x  +  a)xndx . 

Integrate   the    left-hand    side    by    parts    treating     ~    as   one 

part,  and  the  right-hand  side  as  the  sum  of  two  functions, 
and  then 

xn(bQ  +  blx+K1x2+  .  .  .)y—  j {nb^-1  +  {n+l) bxxn 

+  (n  +  2)b,xn  +  ,+   .  .  .)ydx 

or  since  jj  =  0  at  the  ends  of  the  range  of  the  curve  the 
expression  xn(b0+blx+b2p'a+  .  .  .)y  vanishes,  and  using  the 
notation  we  have  already  adopted,  namely, 

fi'nz=  \yxndx, 


we  have 

— rib0fin_i  —  {n+l)bifjb'n—(n  +  2)b2fJb'n+i—  •  •  ■=  ft>'n+i  +  a>f*n* 

If  we  put  n—0, 1,  2  . .  .  s  respectively,  we  get  s-f  1  equations 
to  enable  us  to  find  a,  b0,  bx  .  .  .  &c,  in  terms  of  the  moments 
(//)  as  shown  by  the  following  equations,  which  have  been 
obtained  by  writing  the  equation  in  the  form 

ctfi'n  +  nbQfjbn_x  +  (n  +  l)&i^'„  +  (71  +  2) fc^'*+i  +  .  .  . 

and  then  putting  n  =  03  1,  2,  &c. 

CtfJb'o  +  0  X  &0  +  &l/*'o  +  2&^Lfc'!  +    . 

tt/^',  +  60//0  +  2b}/jb\  +  3&s//2  +  . 
a/A,i  +  2bvfi'i-{-Bbifia-\  4?>2y"'/3-f 
o///3  +  Sb0fl2  +  4^//3  -f  562yLt/4  + 


"/*  ?i+U 


=  —  /A, 

-—  ^'a 


II. 


&c,  &c. 
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Let  us  now  make  /a'i=0,  and  alter  the  other  moments  in 
the  way  indicated  in  Chap.  II.,  for  the  result  of  making  fi']  =  0 
is  to  change  the  origin  of  the  system  to  the  mean  of  the 
distribution.  We  can  also  treat  //0  as  1,  and  these 
simpli cations  lead  to  the  following  results  : — 

(1)  Keeping  b0  only,  we  have 

1  dy  _        x 
y  dx  /jl2  ' 

(2)  Keeping  b0  and  bl}  the  first  three  equations  in  the 
system  II.  above  give 

a  +  b{  =  0 

b0  =  —/ju, 

and  a/jbo  +  36jyu.2  =  —  /x3 

7  fa 

or  &!=  —  g— 

and  a  =  (  - 

and  the  differential  equation  becomes 

1  %  _      ^  IfM 
ydx  /jl3 

^+  &" 

(3)  Keeping  b0,  bXi  b2,  the  system  gives 

b0  +  Sb.2/jL.2=  —  fl2 
a/jb2  +  Sbifi2  +  ^b2fji3  ——fjb3 
a/j*  +  Sb0/u,2  +  4fr,/z3  +  obofii  =—/ii. 
The  solution  of  these   simultaneous  equations  is  perfectly 
straightforward,  and  leads  to 

r  ,  /*«(a*4+3/*32) 

1  dy  _       *      i0/i2ilt4-l8/*a3--l2)tc32 

10/iaA*4-18/iB8-12/i8»      10/i2/i4-18jt*23-12/i32         10/i2/t4-18/i23-12/ts2 

2 

In  this  last  form  put  /3i=  ^  and  /32=  ^  and 

f^2  fa 

y^^tfe  +  3) 

1%  = ^   2(5&-6ft-~9) 

y«k         /^(4/3.2-3A)  +  v^  v  ft(&  +  3)a  +  (2/3a-3ft1-6>8 

2  (5ft- 6ft -9) 
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5.  The  reasoning  by  which  equation  I.  was  first  obtained 
showed  that  a  is  the  distance  between  the  origin  and  the 
mode,  or  as  the  origin  has  now  been  transferred  to  the  mean 
by  putting  //'i  =  0,  a  is  the  distance  between  the  mean  and 
the  mode.     This  distance  in  terms  of  the  moment  is,  therefore, 

qVA(A+3) 

2(5/32-6/31-9) 

where  a  is  the  standard  deviation  y/ '^ 

Since  the  skewness  is  the  distance  between  the  mean  and 
mode  divided  by  the  standard  deviation. 

6.  It  would  be  possible  to  obtain  constants  in  the  differential 
equation  I.  by  using  a  greater  number  of  terms  and  retaining 
b3,  b4,  &c,  but  there  are  strong  practical  objections  to  such 
a  course.  Besides  the  large  increase  in  arithmetical  work,  the 
gain  in  introducing  additional  constants  is  not  great  because 
the  higher  moments  become  untrustworthy,  owing,  as  we  have 
already  noticed,  to  their  probable  errors  being  very  large. 
Professor  Pearson  has  shown*  that  "we  might  easily  on  a 
u  random  sample  reach  a  7th  or  8th  moment  having  half  or 
"  double  the  value  it  actually  has  in  the  general  population. 
"  Constants  based  on  these  high  moments  will  be  practically 
"  idle.  They  may  enable  us  to  describe  closely  an  individual 
"  random  sample,  but  no  safe  argument  can  be  drawn  from  this 
"  individual  sample  as  to  the  general  population  at  large,  at 
"  any  rate  so  far  as  the  argument  is  based  on  the  constants 
"  depending  on  these  high  moments."  In  some  actuarial 
statistics  where  there  are  as  many  as  100,000  cases,  it  might 
be  worth  while  to  go  as  far  as  the  next  term  of  the  series,  but 
even  here  the  value  of  the  work  is  discounted  because  any 
other  smaller  body  of  statistics  on  the  same  subject  could  not 
be  compared  satisfactorily  with  the  result.  For  practical 
purposes  it  is  probable  that  the  equation  taken  as  far  as  h2 
will  be  sufficient,  and  we  shall  confine  our  attention  to  the 
forms  thus  obtained,  merely  remarking  that  in  some  extreme 
cases  in  graduation  another  term  might  be  required. 

°"Skew  Correlation  and  non-linear  Regression,"  Drapers'  Company  Research 

Memoir,  1905,  p.  (J. 
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7.  Turning  to  the  particular  form  of  equation  I.  given  in 
equation  III.  it  will  be  seen  that  it  is  possible  to  obtain  a 
formula  representing  the  statistics  by  inserting  in  that 
equation  the  values  of  the  moments  found  from  the  statistics, 
but  this  would  not  give  a  graduation  in  the  same  form  as  that 
in  which  the  original  data  appeared,  for  in  the  latter  we  have 

?/,  while  the  former  gives  -  -7—  or  — ^-^  •     It  would,  therefore, 

J  b  ydx  dx  '  ' 

be  necessary  to  integrate  the  expression  we  obtain  in  order  to 
get  terms  comparable  with  the  original  data,  and  it  is  better 
in  practical  work  to  deal  with  the  equations  in  the  forms  in 
which  we  require  them  for  comparison,  rather  than  by  using 
the  differential  equations  and  then  integrating  the  result. 
The  latter  method  could  only  give  proportional  not  actual 
frequencies. 

8.  The  next  step  is,  therefore,  to  replace  the  equation 

d  log  y  _  x  +  a 

dx  bQ  +  biX-\-  b2x~ 

x  ~4-  (/ 
by  one  of  the  form  y=f(x),  and  to  do  this  -r ^ r — -  must 

be  integrated. 

Let  us  consider  equation  III.  as  a  general  expression  for 
integration,  then  we  notice  that  the  form  the  integral  takes 
depends  on  the  particular  values  of  the  coefficients  of  x  in 
the  denominator.  The  problem  is,  in  fact,  merely  a 
consideration  of  the  forms  taken  by  the  denominator  for 

bQ  +  bxx  +  b2x-  =  b.2\  x— *  

-&i-v/{&i2-4&0&2}-| 


[- 


2b, 


and  the  criterion  for  fixing  the  form  in  a  particular  case  is, 
obviously,  the  same  as  that  for  the  nature  of  the  roots  of  the 

equation  b0  +  bvv  +  b.,x2  =  0,   viz.,  -r^ ,  which,  by  substituting 

460o2 
in  formula  III.,  gives — 

/3i(A  +  3)2 

4(2/32-3/31-6)(4/32-3/31)' 

9.    If  this  is  negative  the  roots  are  real  and  of  different  sign 
(Type  I.),  if  positive  and  less  than  unity  they  are  complex 
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(Type  IV.) ,  and  if  positive  and  greater  than  unity  they  are 
real  and  of  the  same  sign  (Type  VI.).  This  really  covers  all 
the  cases,  but  just  at  the  point  where  one  type  changes  into 
another  we  can  use  a  slightly  simpler  transition  curve.  Thus 
when  the  criterion  is  oo  ,  one  root  is  oo  (Type  III.),  when  it  is 
unity  the  two  roots  are  equal  (Type  V.),  while  when  it  is 
zero  the  roots  are  equal  in  magnitude  but  of  opposite  sign 
(Type  II.).  The  only  other  transition  curve  arises  when 
bl  =  b2  =  0,  and  the  criterion  is  again  zero  (Normal  Curve  of 
Error,  Type  VII.). 
10.  The  actual  integration  can  now  be  considered. 

Type  I. — The  factors  in  the  denominator,  when  the  roots 
of  b0  +  biX  +  b2xr  =  0  are  real  and  of  different  signs,  take  the 
form 


7   r        —  b}  +  -v^a  positive  quantity"! 

-r  2b2  J 

—  bx  —  >v/a  positive  quantity"] 

and  the  expression  to  be  integrated  is  therefore  of  the  form 
x  +  a  A,— a  A2  +  a 


(x  +  A.\)(x— A2)      A!  +  A2   aj  +  Ai      Ai  +  A2  x— A2 

by  partial  fractions. 

The  integration  is  now  simple,  and  gives 

log  y=  A  ^jrr  lo§'  (a'  + A0  +  A +A    lo8'  («— A2)+a  constant. 

A,—  a  Ag+a 

7/=Z//(tl'  +  A1)A1  +  A2(^_A2)  a7+a, 

where  y'  results  from  the  constant  introduced  by  integration. 
If  the  origin  is  now  transferred  to  the  mode  {i.e.,  put  x  for 
x  +  a),  we  have 


-KTO-0' 


the  form  given  in  Table  VI. 

Type  II. — In  this  type  a{  =  a.2  in  Type  L,  and  it  is,  therefore, 
unnecessary  to  give  the  working. 
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Type  III. — This  type  is  reached  when   the  criterion  is  cc  , 
which  happens  when  b2=Q, 

ho 
1  a~l\ 
bx       bxx  +  bQi 


x       / 


-^)Jlog(6i*  +  6J+0 


-  -  ("  -  ^) 

and  y =  y'e  bl  (^  +  W  bl        bl 

or,  by  changing  the  origin, 

x\ya 

y 


■=w  (l+5) 


where  a  has  a  meaning  different  from  that  implied  in 
Equation  I.  This  type  can  be  seen  to  be  a  particular  case 
of  Type  I.  when  a2  becomes  infinite. 

Type  IV. — If  the  roots  of  the  equation  bi)-\-b1x  +  b2x2  =  0 
are  complex,  it  is  impossible  to  throw  the  denominator  into 
real  factors  ;  and  when  this  occurs,  we  have  to  integrate  by 
putting  the  expression  on  the  right-hand  side  of  the 
fundamental  differential  equation  in  the  form 

X  +  c 
fc2(X2  +  A2) 

.  -7-  bi  bi  ,    .         b{,        b{2 

where  A.=x  +  ==-.   c  =  a—  tT7-,  ana  A~  =  =    — .— 

26o  26-,  o2      4o22 

Then 

loo»y=\b-(i?+A*)dX 

=\b2(X*+A>)dX+\x^dX 

1  c  X 

=      -log(X2  +  A2)  +  -.tan1      +  constant. 
2t>2  A.  -A. 

u=y'(X2+A.*)2b*-e±i*n--1\ 


(1  +  U 


where  a  has  a  meaning  different  from  that  implied  in  equation  I. 
The  relation  between  this  type  and  Type  I.  can  be  seen  by 
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factorising  the  denominator  of  the  right-hand  side  of  the 
differential  equation,  62(X— iA)(x  +  iA.)}  and  then  obtaining  an 
expression  for  y  having  the  same  form  as  Type  L,  but 
containing  complex  expressions. 

Type  V. — In  this  case,  when  the  roots  are  real  and  equal, 
,  fl      x+a       , 


ri(*+ftW"  ft). 


(-ft) 


^         r        262 


a  — 

da; 


= sM" +  26)J + r? — K\ +  constant 


+ 

a 

— 

01 

262 

b2(x 

+ 

2bJ 

a- 

6, 
~262 

^'('+ftK4'^ 


=  y0xpe  y' 

Type  VI. — The  factorising  is  the  same  as  Type  I.,  but  the 
roots  of  the  equation  being  of  like  sign,  the  factors  of  the 
denominator  take  the  form  (x  +  A2)  (a?  +  A2) .  The  work  is  then 
the  same,  but  at  the  end  the  origin  is  put  not  at  the  mode  but 
so  that  one  of  the  expressions  x  +  Ai  or  x  +  A2  can  be  written 
as  x.     The  form  is  then 

y  =  yQ(x  —  a)m'X~m'-. 

Type  VIL— Putting 
Ol=52=0 

x2       ax 
=  ^,-  +  -= — h  constant 

2&0         ^0 

jx+a)2  ,  .      . 

=  ^—^t \-  constant 

y=y'e<P+a)2l2b0 
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or,  by  changing  the  origin  and  remembering  that  the  sign  of 
the  expression  in  equation  III.  is  negative, 

11.  The  table  on  p.  47  gives  a  list  of  the  curves,  a  description 
of  their  appearance  and  range,  the  position  of  the  mode  and 
the  criteria.  The  values  of  /3i  and  /32  in  the  cases  of  Type  II. 
and  Type  VII.  can  be  seen  to  be  required  by  examining 
equation  III.  The  third  moment  about  the  mean  must  be 
very  small  (theoretically,  zero)  if  the  curve  is  symmetrical, 
and  therefore  /3i  =  0,  and  it  is  only  when  /32  =  3  and  (3i  =  0  that 
both  the  coefficients  of  x  and  x1  in  equation  III.  vanish ;  this 
being  the  condition  for  Type  VII. 

12.  It  is  now  necessary  to  recapitulate  the  method,  and  see 
the  steps  that  have  to  be  taken  to  fit  a  frequency-curve  to 
statistics. 

1.  Arrange  the  statistics  in  sequence. 

2.  Calculate  the  moments  about  a  convenient  vertical. 

3.  Transfer    the    moments    to    the    centroid    vertical 

(vertical  through  the  mean). 

4.  If  there  is  high   contact  at  both  ends  of  the   curve, 

apply  Sheppard's  adjustments  to  the  moments 
(i.e.,  deduct  y^  and  \v2^ ^To  fr°m  the  second  and 
fourth  moments  respectively). 

5.  Calculate  the  criterion. 

6.  By  means  of  Table  VI.  decide   which   curve  should 

be  used. 

Table  VI.  gives  a  reference  to  the  page  on  which  the 
formulae  for  the  constants  of  each  curve  in  terms  of  the 
moments  are  to  be  found. 


47 


>  ^ 


1-3         S 


55 

O 

5 

a 

s 

o 

* 

not  =  3 

... 
=  3 

oa 

o            .            .            .            .          o 
:            II              :             :             :             :            || 

« 

rH                                              8 
V                                              w 

O              O              #                                 H                v              O 

V          B           8         "^           B          «3           1 

A                              A 

0> 

O 

!§>       !S>       S      2  lei     ^|  ©,    ^  f     !§p 

6       6       6        i              e,^     o 

For 

calculation 

of 

Constants, 

see  page 

CO                  r-H                  O                  Ci                  GO                  CO                  I> 
tQ                CO                CD                CD                J>                00                00 

Equation  to  Curve. 

1             «       1 

^    <      r    ,l 

+               1             +             +            7               |             s 

^       S       <       ^       £       4       £ 

II             II             II             1!              II             II             II 
3»s             S*>             3*>             3*j              5»j             Ssj              gsj 

Description  of  Curve. 

Limited  range  in  both  directions  (skew) 
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CHAPTER  V. 

Calculation. 

1.  The  next  point  to  be  considered  is  the  calculation  of  the 
constants  for  any  particular  distribution,  when  the  moments 
have  been  calculated  and  the  type  to  be  used  has  been 
decided.  The  formulas  required  for  the  numerical  work  will 
be  given  for  each  type,  a  numerical  example,  including  the 
calculation  of  the  graduated  figures,  will  follow,  and  finally 
the  proofs  of  the  formula?. 

2.  Some  general  points  relating  to  the  calculation  of  the 
curves  when  the  constants  have  been  found  may  be 
conveniently  considered  here.  When  the  constants  are 
known,  we  can  calculate  the  ordinate  for  any  value  of  x  by 
substituting  that  value  in  the  expression  for  the  frequency- 
curve  ;  and  if  areas  are  required,  some  method  of  proceeding 
from  ordiuates  to  areas  must  be  found.  The  most  simple  is 
probably  to  calculate  mid-ordinates,  and  then  by  the 
quadrature  formula  I.  or  II.  find  the  areas.  It  is  occasionally 
more  convenient  to  calculate  the  ordinates  at  the  beginning  of 
each  group,  and  then  formula  III.  should  be  used.  These 
formulae  can  be  best  applied  in  the  form  of  differences  ;  thus, 
from  II.  we  have 


f*  1 

291  17 

j   ^=2/0-  bt8o!a*-'-a*I  +  5760  ;A^"A^ 

III. 


J-i 

from  I. 


from  III. 
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Formula  II.  is  generally  sufficiently  accurate,  while  the  others 
will  be  found  to  give  a  result  true  to  five  figures  in  ordinary 
cases — exceptional  cases  will  be  referred  to  in  the  numerical 
examples  that  follow. 

3.  It  is  sometimes  a  help  to  see  the  graduation  expressed 
graphically,  and  this  has  been  done  with  some  of  the  examples. 
The  best  method  is  to  insert  a  vertical  height  yQ  at  the  mode  ; 
note  the  ends  of  the  curve,  and  the  heights  of  the  ordinates 
that  have  been  calculated.  These  heights  give  points  on  the 
curve,  which  can  be  drawn  through  them  fairly  easily.  In 
drawing  the  curve,  as  well  as  in  calculating  the  constants,  the 
sign  of  the  skewness  must  be  borne  in  mind,  for  it  is  possible 
to  draw  the  curve  with  the  skewness  on  the  wrong  side  of  the 
mode,  and  if  the  distribution  is  nearly  sjmimetrical,  it  is  not 
so  easy  to  notice  the  mistake  as  it  seems  to  be.  The  tangent 
to  the  curve  at  the  mode  is  parallel  to  the  axis  of  x. 

4.  It  is  best  to  draw  on  a  rather  large  scale  in  order  to  gain 
distinctness,  and  the  curves  given  here  were  drawn  larger 
than  their  present  size ;  the  reduction  being,  of  course,  made 
in  the  process  of  reproduction. 

The  base  elements  should  also  be  fairly  large  in  proportion 
to  the  height,  so  that  the  curve  may  not  ascend  too  steeply  ; 
otherwise  small  horizontal  differences  between  the  graduated 
and  ungraduated  curves  are  apt  to  conceal  large  vertical 
differences  when  the  curve  is  rising  or  falling  rapidly,  but  it 
is  the  latter  differences  that  are  of  importance.  It  is  some- 
times necessary  to  use  more  closely-ruled  paper  than  that 
generally  favoured  by  actuaries,  and  it  can  be  procured  in  very 
convenient  rulings  from  Messrs.  W.  Gr.  Pye  &  Co.,  Granta 
Works,  Cambridge. 

5.  The  reader  should  notice  that  all  the  cases  considered  in 
the  following  pages  assume  complete  distributions,  and  it  is  in 
general  only  possible  to  find  the  curve  from  part  of  a 
distribution  by  means  of  successive  approximation  which  is 
extremely  laborious.  Another  point,  to  which  reference  will 
again  be  made,  is  with  regard  to  grouping  statistics  ;  it  is 
sometimes  impossible  to  obtain  many  groups,  but  for  accuracy 
in  finding  moments  the  greater  the  number  of  groups  the 
better,  unless  the  total  number  of  cases  is  small.  A  little 
discretion  is  needed  in  this  respect,  but  in  actuarial  statistics 
which   are   sometimes  based    on  as  many    as   200,000  cases, 
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seventy  or  eighty  groups  would  not  be  excessive.  In  our 
examples  Ave  have  grouped  merely  to  save  work,  space  and 
printing,  and  the  grouping  does  not  alter  the  method. 
6,  Another  matter  with  which  it  seems  advisable  to  deal  here 
is  connected  with  the  criterion,  k.  This  may  have  any  value 
from  -co  to  +  00,  and  from  the  following  diagram  it  will  be 
seen  how  the  types  cover  all  the  possible  values  of  the  criterion 
and  do  not  overlap. 


K  = 

—  X 

K  = 

=  0                               K  =  l 

.<  = 

=  00 

k  negative 

k>0  and   <  1   1 

k>1 

Type  I. 

Type  IV. 

Type  VI. 

Typ 

elll. 

Type  VII.            Type  V. 
when  &>  =  3 

Type  III 

Tyi 

when  j8 

>ell. 

2  not  =3 

Just  before  /c  =  0  Type  I.  becomes  nearly  symmetrical, 
and  after  that  value  is  passed  we  have  a  skew  curve  of 
unlimited  range,  and  so  on.  At  each  critical  point  there  is  a 
"  transition "  curve,  as  it  is  sometimes  called ;  so  Types  II., 
III.,  V.  and  VII.,  are  the  transition  types.  If  by  a  mistake  a 
student  should  use  the  wrong  type  he  will  necessarily  find  his 
mistake  by  reaching  an  imaginary  in  one  of  the  square  roots 
which  occur  in  the  equations  for  the  constants,  but  transition 
types  can  be  used  when  the  values  of  the  criterion  approximate 
to  the  theoretical  values  ;  they  can,  in  fact,  be  viewed  as 
approximations  which  give  an  accurate  result  in  a  limiting 
case.  It  is  impossible  to  say  within  what  limits  one  is  justified 
in  using  a  transition  type;  theoretically  the  justification 
depends  on  the  size  of  the  probable  error  of  the  function  dealt 
with,  but  in  practice  one  can  be  guided  to  a  great  extent  by 
the  size  of  the  experience  ;  if  there  are  few  cases  a  larger 
deviation  in  the  criterion  will  arise  than  if  there  are  many.  It 
would  probably  be  sufficiently  accurate  to  use  Type  III., 
provided  tc  was  arithmetically  greater  than  4  ;  individual  cases 
must  be  considered  on  their  merits,  but  if  the  student  finds 
himself  in  doubt  he  should  avoid  using  the  transition  type  as 
he  will  then  be  on  the  safe  side  in  the  matter  of  accuracy. 
7.  In  the  formulae  that  are  given  for  the  various  types,  the 
choice  of  sign  for  a  square  root  depends  on  the  sign  of  //,3 .  If 
the  frequency  is  concentrated  more  closely  before  the  mean 
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than  after  it,  the  mode  is  on  the  left-hand  side  of  the  mean  and 
fi3  is  positive  ;  the  signs  of  certain  constants  in  each  type  must 
therefore  depend  on  the  signs  of  fa  in  order  that  the  mode  and 
mean  may  lie  in  their  correct  relative  positions.  Where, 
however,  no  remark  is  made  as  to  the  sign  of  the  expression  in 
which  a  square  root  is  given  the  positive  root  is  implied,  and 
the  reader  will  find  that  these  rules  become  easier  to  follow 
when  he  has  worked  out  two  examples,  one  giving  a  positive 
and  the  other  a  negative  value  for  /x3 .  Thus,  if  we  imagine 
the  frequencies  in  the  example  for  Type  I.  to  be  written  in 
the  opposite  order  1,  3,  7,  13,  &c,  all  the  numerical  work 
would  be  the  same,  but  raj  would  be  2'776978,  ra2=  '409833, 
Oi  =  13*52728,  and  a2  =1*99638,  and  the  graduation  would  be 
the  same,  but  the  numbers  in  the  columns  of  the  table  on  p.  56 
would  run  in  the  opposite  order. 


£   2 
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FORMULA   FOR   MOMENTS. 


These  Formulae  apply  to  all  the  Types  of  Curves. 


v\  =  d 
Vo^v'o  —  d2 
vA  =  v  2.  ~~  3rfi>2  ~  d? 
v4=v'4  —  ^dv2 — Qd2v2 — d4 
or  S2=d 

v,=2S3-d{l+d) 

v,=e>s4Sv2(i  +  d)-d{i+d){2+d) 

vi  =  24S,-2v^2{l+d)  +  l}-p.2{6{l+d){2  +  d)-l} 

-d(l+d)(2  +  d)(3  +  d) 

fj,2z=v2  —  Jj 

Sheppard's  adjustments  when  the 


cr (standard  deviation)  =  vV-» 

A(A+8)« 


curve  has  high  contact 


4(4/32-3^0(2^-3^-6) 


DO 


TYPE     I 


-O+0"O-0 


ax  ~  a2 


FORMULAE. 


The  values  to  be  calculated  in  order  are 

6(ft-ft-l)  \ 

?~  6+S&-2& 

6  =  W/WWi  (r  +  2)2+ 1 6(r  + 1)} 
?>i2  and  /?i!  are  given  by 

s  |r  -  2  +  <r  +  2)  J §1 

2\         -  k   T    ;Vft(r  +  2)*+16(r+l) 

and  «x  +  «2=  &,  and  ai-r-mi  =  a.2-^m.2 

_  N      m^mj'1-         r(mi  +  m2-f  2) 
^° —  6  (m1  +  m2)"^+'^  r(m,  + 1  )r  (m2  + 1 ) 

A  table  of  T  functions  is  required  (see  Appendix  II.). 

Skewness  =  5  v^i 5 

Mode  =  Mean  -\&\r-±%\. 
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NOTES. 


mi  is  taken  with  the  negative  root  when  yLt3  is  positive,  and 
as  the  positive  root  when  yu,3is  negative. 

Sometimes  mx  is  negative,  which  means  that  the  curve  has 
a  similar  shape  to  that  given  in  the  numerical  example  of 
Type  III. ;  it  starts  at  infinity,  and  falls  rapidly ;  so  that 
though  the  ordinate  is  infinite,  the  area  is  finite.  The 
difference  is  that  in  Type  I.  the  curve  ends  at  a  fixed 
point,  while  in  Type  III.  it  continues  indefinitely.  In  this 
case  a  little  care  is  needed  in  taking  out  the  T  function, 
for  T(t)  is  required  where  t<l;  the  tables  give  \ogT(l  +  t), 
i.e.,  logt-\-\ogT(t).  If  both  <mx  and  m2  are  negative,  a 
U-shaped  curve  is  obtained. 


EXAMPLE. 


As  an  example  of  this  type  the  figures  given  in  Table  I. 
(Example  II.)  may  be  used.  The  moments  were  first  found  by 
Mr.  Hardy's  Summation  Method  (see  Chap.  III.,  Art,  9)  in  the 
following  form: 


Central 

Exposed 

_ 

Age 

to  Risk 

First 

Second 

Third 

Fourth 

of 

Example  II 

of  Table  I. 

Sum. 

Sum. 

Sum. 

Sum. 

Group. 

17 

34 

1,000 

5,175 

19,809 

64,389 

22 

145 

966 

4,175 

14,634 

44,580 

27 

156 

821 

3,209 

10,459 

29,946 

32 

145 

665 

2,388 

7,250 

19,487 

37 

123 

520 

1,723 

4,862 

12,237 

42 

103 

397 

1,203 

3,139 

7,375 

47 

86 

294 

806 

1,936 

4,236 

52 

71 

208 

512 

1,130 

2,300 

57 

55 

137 

304 

618 

1,170 

62 

37 

82 

167 

314 

552 

67 

21 

45 

85 

147 

238 

72 

13 

24 

40 

62 

91 

77 

7 

11 

16 

22 

29 

82 

3 

4 

5 

6 

7 

87 

1 

1 

1 

1 

1 

Totals 

1,000 

5,175 

19,809 

64,389 

186,638 

S2=     5175-f-1000=     5-175 

S3=  19809--1000=  19-809 
S4=  64389-- 1000=  64-389 
S5=  186638--1000  =  186-638 

The  next  step  is  to  find  the  moments  about  the  centroid 
vertical  by  means  of  the  formula?  on  p.  21,  and,  in  this 
case,  as  no  adjustments*  are  to  be  made  in  the  moments  the 
v's  and  ///s  are  the  same  because  there  is  not  high  contact, 
we  have — 

/z.,=      766237 

M3=   15-1069 

^4=172-326 

ft=       -5072955 

(3,=     2-935110 

From  the  values  of  fti  and  /32  the  criterion  (k)  can  be 
calculated,  and  its  value  being  —'2645  shows  that  Type  I.  must 
be  used  (see  Table  VI.). 

r  =  5-186811  logr  ='7149004 

r  + 1=6-186811  log  (r+l)= '7914669 

r  +  2  =  7'186811  log  (r  +  2)  ='8565363 

7—2=3-186811  log  (r- 2)  =-5033563 

The  values  of  log(r+l),  &c,  were  checked  by  a 
Gauss-logarithm  table. 

6  =  15*52366 

m,  =     -409833 

m,=   2-776978 

a,=    1-99638 

a2=  13-52728 

Mean-mode=   2*223116 

It  will  be  noted  that  the  expression  {{31(r  +  2)2+  16(r  +  1)}* 
occurs  in  both  the  values  of  b  and  m. 

The  mean  is  at  age  12  + 5*1 75  x  5  =  37*8750,  and  the  mode 
at  age  37-8750-2-223116  x  5  =  2675942. 

The  skewness  is  '8032. 

*  In  work  which  lias  a  permanent  object  depending  on  a  considerable  degree 
of  accuracy,  grouping  should  be  avoided.  In  the  examples  given  it  was  simply 
done  to  save  labour,  and  the  original  reasons  for  which  the  corves  were 
calculated  did  not  require  extreme  accuracy.  If  we  had  not  grouped  our 
statistics  we  should  have  reduced  the  error  resulting  from  our  not  knowing  the 
best  adjustments  to  use  in  cases  in  which  there  is  not  high  contact. 
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The  calculation  of  logyo  is  as  follows  : — 

log  X  =  3-00000 

colog  6= 2-80901 

mllogm1  =  1-84123 

mo\oo;m.i  =  1-23179 

_  i 

colog  (r-2)>-2=2-39590 

log  T(r)=  1-50406 

colog  r(w!  +  l)=   -05219 
colog  r(w2+l)  =  1-34037 

logy0     2-17455 

where,  of  course,  log  T{m,+  l)  =  \og  r(3*776978)  =log  2"776978 
+  logl-776978  +  logr(l*776978)3  the  last  value  being  taken 
from  the  table  at  the  end  of  the  book. 

The  work  to  this  point  gives  as  the  curve  for  graduating 
the  statistics 


y= 149-47]  1 


X 

1-99(338  * 


\  '400833  /  x  .  2776978 

i1"  13-52728) 


where  the  origin  is  at  age  26*75942  and  the  unit  is  five  years. 
The  following  table  shows  the  calculation  of  ordinates  of 
the  curve  from  the  equation  just  given : — 


col  (6)  + col  (7) 

Age 

1+  * 

1-  ° 

log- (2) 

log  :; 

m  ,  x  col  (4) 

m.2  x  col (5) 

+  log#o 

Vx 

a  \ 

"  i 

=logya! 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7)/ 

(8) 

(9) 

( 

17 

•02228 

114429 

2-31792 

0-05854 

1-3229 

04626 

1-6601 

45-7 

22 

•52319 

1-07037 

1-71866 

•02955 

1-8847 

•0821 

2  1404 

138-2 

27 

1-02410 

•99614 

0-01034 

1-99815 

0-0042 

1-9957 

21745 

149-5 

32 

1-52501 

•92252 

•18327 

•96198 

•0751 

•9027 

2-1525 

142-1 

37 

2-02592 

•81859 

•30662 

•92870 

•1257 

•8020 

2-1023 

1266 

42 

2-52683 

•77466 

•40257 

•S8911 

•1650 

•6921 

20317 

107-6 

47 

3-02774 

•70074 

•48111 

•84556 

•1972 

•5711 

1-9429 

87-7 

52 

352865 

•62681 

•51760 

•79714 

•2241 

•1367 

1-8357 

68-5 

57 

402956 

•55289 

•60526 

•74264 

•2481 

•2S53 

1-7080 

51-0 

62 

4-530*7 

•47896 

"65615 

•68030 

•2689 

•1122 

1*5557 

36-0 

67 

503136 

•40501 

•70169 

•60750 

•2876 

29100 

1-3722 

236 

72 

5-53229 

•33111 

•74291 

•51997 

•3045 

•6670 

1-1461 

14-0 

77 

6-03320 

•25719 

"78055 

•41025 

•3199 

•3623 

■8568 

72 

82 

6-53411 

•18326 

•81519 

•26307 

•3341 

3-9535 

•4622 

2-9 

87 

7-03502 

•10934 

•84726 

•03878 

•3472 

•3307 

1-8525 

•7 

92 

753593 

•03541 

•87714 

2-54913 

•3595 

5-9709 

3-5050 

57 


O 


o 


o 
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Cols.   (2)   and    (3)   have   a   constant   first    difference,    viz., 

-  or  -500907,  and    — ,  or  -073925.     The  value  at  any  point 

having  been  calculated  and  checked,  the  other  items  are 
formed  continuously.  Cols.  (4)  to  (9)  explain  themselves,  but 
we  may  remark  that  it  is  generally  advisable  to  use  a  larger 
number  of  figures  than  five  in  taking  logarithms,  especially  if 
nil  or  m2  is  large.  A  little  care  is  necessary  in  multiplying 
such  numbers  as  T71866  by  m1(,409833).  If  an  arithmometer 
is  used,  mi  is  put  on  the  plate,  and  is  multiplied  by  —'28134, 
and  the  result  —'1153  must  be  put  in  the  form  1*8847,  to 
enable  us  to  add  it  to  other  logarithms.  Col.  (10)  gives  the 
area,  and  was  formed  by  applying  one  of  the  formulae  on  p.  48. 
The  area  of  the  first  group  must  be  treated  separately,  as  the 
curve  starts  at  age  16*7775,  and  the  base  of  the  group  is 
therefore  2*7225  in  length,  instead  of  5  years  as  in  the  other 
cases.  A  good  way  to  find  the  area  is  to  calculate  the 
ordinates  for  the  middle  and  ends  of  the  base,  and  apply 
Simpson's  rule,  viz.  : — 

fi 

\yxd%=±{yQ  +  4!yii  +  y1}, 

2*7225 

remembering  to  multiply  the  result  by   — — -  to  allow  for  the 

o 

different  length  of  the  base. 

The  mid-ordinate  is  92*1,  the  ordinate  at  the  end  of  the 

base  is  116*5,  and  the  ordinate  at  the  start  is  of  course  zero  ; 

the  area  is  approximately 

tl±^i  x  f.  {0  +  4x92-1 +  116-5}  =44. 

o  b 


PROOF   OF   FORMULA* 

The    equation    to    the    curve    is    y  =  ijJl+'    J     M J 

,         my      m2 
where  —  =  —  • 
«!        a2 

Let  «!  +  a.,  =  b  and  z  =  — • 

«|  +  d2 

*  The  reader  -who  lias  little  acquaintance  with  formula?  of  reduction  and  the 
T  and  B  functions,  should  consult  Appendix  II.  before  reading  the  proofs  of  the 
formulae  for  this  and  the  other  types. 
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The  area  from  »=  —  a,  to  a?=  +  «2  is  the  total  frequency  N. 


//» 


-  (a i  +  a?) m '  (a2 — x) "'- -  d.v 


Jo  al      ft2    " 

_N        m,ffi'wi2'"=  r(ra1  +  ra2+2) 

y°  ~~  6  '  ( »h  +  ~m^+'^  '  I>ir+l)r(7?i2+I)  ' 

Using  the  same  method  for  the  moments  as  that  just  given 
for  the  area,  we  see  that  the  nth  moment,  about  the  line 
parallel  to  the  axis  of  y  through  a?=  —aY,  is — 

J  _tt]  «!      'fl2     " 

J0  th^aj^  v         y 

_  y0(m1  +  m2)mi+m--6n+1  >  r(?HH-  n  +  l)r(m2  +  1)  _ 
m^ 'W2m2  ~  r(mj  +  wia + fl  +2) 

Now,  since  r(p)  =  (jy  —  l)r(p—l),  the  moments  about  the 
line  parallel  to  the  axis  of  y  through  a?=  —  «!  are  as  follows  : — 

,  =    bjm.  +  l) 
l      mx  +  ra2  +  2 

y(m1  +  l)(m1  +  2)  , 

A6  2=  7 : TTTw 1 T^T  and  so  on- 

( wi,  +  wia  +  2)  (m,  +  ma  +  3) 
Changing  the  origin  in  order  to  get  moments  about  the  mean 


GO 

and   writing    m/1  =  m1  +  l    and    m,2=m2+l    and    r  =  m\+iu' 
we  have 

h~m  vi)b., 


^ 


r2(r  +  l) 

2bhn  xm' 2(;m' 2  —  rn'i) 
r»(r +  !)(»■  + 2) 


3&*m'imV|  mr\m2{r  —  6)  —  2r2  ] 

^4~~~      r<(r+l)(r  +  2)(r  +  3) 

We  can  simplify  these  expressions  to  obtain  the  equations  on 

2 

p.  53  by  writing  /3l=^r3,  j32=     2,  and  e—m\mr2\  then 

_  4(r»-4e)(r+l)  &(r  +  2)2  _  r2 

Pl"        e(r  +  2)2  01     4(r  +  l)   ~T     ^ 

and  p_3(r  +  l){2^  +  6(r-6)} 


/92(r+2)(r+3)       2t* 
3(r  +  l)  e 


Eliminating  —  we  find 

°    e 

ft(r+2)2      &(r  +  2)Q+3)_ 
2(r+l)  3(r  +  l)        ~  + 

Dividing  out  by  r  +  2  we  have 

6(&-A-l) 

3^ -2ft +  6  ' 
Using  this  value  in  the  equation  ^j-^ — ^-  = 4 


4+W  +  2)' 


4^'    r+1 
and  from  the  equation  for  ^ 

e 

The  other  equations   follow  at  once  from  r=m'i  +  m'9  and 

e—m\m2.  The  distance  between  the  mean  and  mode  is 
ax  —  /j/'i  =  (a1  —  bm/i)-T-(m'i  +  m'2)J  which  can  be  easily  reduced 
to  the  form  given.  A  general  value  (regardless  of  type)  for 
the  distance  was  given  in  Chap.  IV. ,  Art.  5. 


01 


TYPE    II. 


y 


-0-5) 


FORMULAE. 


a- 


2(3-/32) 

JW32 
3-  /32 


NxT(2m  +  2) 
^0"a2^+1{r(m  +  l)}! 
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NOTES    AND    PROOF. 

Put  /3]  =  0  ia  Type  L,  for  the  curve  is  symmetrical,  and 
therefore  /jl3  =  0.     For  the  same  reason  it  is  clear  that  mi=m2. 

r  may  be  approximated  to  if  m  is  large. 

If  m  is  positive,  the  curve  starts  at  zero,  rises  to  a 
maximum  and  falls  again  to  zero;  but  if  m  is  negative,  it 
starts  at  infinity,  falls,  and  then  rises  to  infinity  again. 

EXAMPLE. 

In  the  discussion  that  followed  the  reading  of  Mr.  Lidstone's 
paper  on  Endowment  Assurances,  Mr.  Gr.  F.  Hardy  said  that 
"the  errors  in  the  successive  groups  formed  a  curve  very 
similar  to  the  normal  curve  of  error  "  (Journal  of  the  Institute 
of  Actuaries,  xxxiv.,  p.  87),  and  the  series  in  question  is  a 
rather  interesting  example  of  a  symmetrical  distribution. 


Unexpired  Term  in  Years. 

Error  involved  in  using  "Mean  Age" 
Method. 

0-   4 
5-  9 
10-14 
15-19 
20-24 
25-29 
30-34 
35,  &c. 

11 
116 
274 
451 
432 
267 
116 

16 

1,683 

Moments  were  calculated  about  the  centre  of  the  15-19 
group,  and  '4985146,  2-161022,  3-104576,  and  12-60666  were 
found  for  the  first  four  moments ;  transferring  to  the  mean 
(17-54- 2-492573  =  19-992573),  and  using  Sheppard's  adjust- 
ments, the  following  values  result : — 


fl2  = 

1-829172 

!**= 

•120452 

fi4  = 

8-52636 

A  = 

•0023706 

&= 

2-548313 

/Co  =   - 

-  -007492, 

"liicli     shows     that 
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Type  II.  can  be  used.     The  equations  for  the  type  give 
m=     4-141766 
a=     4-543079 

?/0  =  462-57 
The  mean  and  mode  coincide,  because  the  curve  is  symmetrical. 
For     calculating    a    series     of      values,      the     following- 
arrangement  is  convenient  : — 


X 

a 
(1) 

log(l+?) 

(2) 

(3) 

(2) +  (3) 

(4) 

—  mx  (4) 
+  !/« 

(5) 

It  is  easier  to  work  in   this  way  than  by  calculating  values  of 

1—    2.     In  the  particular  example,  ordiuates  were  calculated 

at   the    beginning,    middle,    and    end    of    each    group,    and 
Simpson's,  quadrature  formula  was  used  for  finding  the  areas, 

viz.,    ^ydx=-{ij0  +  4iA  +  y1}  . 


Group. 

Areas. 

Mid-ordinates. 

0-  4 

14 

u 

5-  9 

109 

104 

10-14 

286 

287 

15-19 

433 

440 

20-24 

433 

440 

25-29 

285 

287 

30-34 

109 

104 

35,  &c. 

14 

11 

1,683 

A  comparison  of  the  mid-ordinates  with  the  areas  gives  an 
idea  of  the  error  involved  in  using  the  former  for  the  latter ; 
the  differences  are  largest  at  the  "  tails  "  and  near  the  mode. 

The  curve  starts  at  19*992573 -227 1540  =- 272283,  and 
ends  at  42*70797. 

It  sometimes  happens  that  /32>3,  and  so  a2  and  m  are 
negative  ;  if  a/2=  — a2  and  m'=  —m,  in  such  a  case  the  equation 

/         x2  \  — '" 
to  the  curve  becomes  y  =  yJl  +  -^  J      and  the  value  of  y0  can 

best  be  found  by 
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i +  *    /         x2\-r'r  [J-        /  x2\"m 


then  putting  In — ^  =z   J  the  reader  will  have  no  difficulty  in 
a 

P 
showing  that  N  =     v0.a'(l  —  z)~^w/"1^  =  a/y0B(m/  —  J,  J)    by 

Jo 

Appendix  II.,  or  y0=  -,  •         '-i/m)    and  r(i)  =  V^. 

In   a  similar  manner  we   could   show  that  an  alternative 
value  for  ?/0  to  that  given  on  the  previous  page  is 


Vo  = 


N      r(m+l 


a    s/nrTim+l 


15  20  Z5  30 


35  *0 
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TYPE    III. 


*-*f*(i+iF 


FORMULAE. 


2/x., 

fl3 


2/A22_    fJL^ 
/X3  2/Xo 


Mode  =  Mean  -^ 


NOTES. 

If  f  is  positive,  the  shape  of  the  curve  is  like  that  shown 
in  the  example  of  Type  I. ;  but  instead  of  ending  at  a 
fixed  point,  it  goes  to  infinity. 

EXAMPLE. 

The  following  statistics  are  taken  from  a  paper  in  the 
Transactions  of  the  Actuarial  Society  of  Edinburgh,  vol.  iv., 
p.  44,  and  give  the  numbers  of  wives  tabulated  for  the  ages 
of  mothers,  and  according  to  years  since  marriage.  The 
mothers'  ages  for  the  particular  series  are  30  to  34. 


Year  after  Marriage. 

Number  of  Wives. 

Graduated  by  Type  III. 
Curve. 

1 
2 
3 
4 
5 
6 
7 
8 

44 

135 

45 

12 

8 

3 

1 

3 

59 

111 

45 

20 

9 

4 

2 

1 

Total 

251 

251 

The  mean  is  '3346612  after  the  middle  of  the  second 
group,  and  the  moments  about  the  centroid  vertical  are 
1-441787,  3-606622,  and  18-93221 ;  so  that  *=-8-44. 

As  this  value  was  large,  Type  III.  was  used,  and 

7=       -7995221 

v=-   -0783584 
a=-   -098007 
7/0=214-8 

This  example  is  given  because  it  is  one  which  shows  a 
difficulty  rather  clearly.  At  first  sight,  a  curve  starting  at  zero, 
risino-  to  a  maximum,  and  then  falling,  might  be  expected. 
In  reality,  we  find  the  curve  starting  at  duration  '68192 ;  * 

^3 


The  mode  in  ordinary  cases  of  Type  III.  is  given  by  mean 


In  this 


case 


i^_  =1*25075  ;  so  the  mode  would  be  at  "58391,  and  the  curve  would  start 


at  {"  mode  "  -  a}  =  '58391  +  '09801  ='68192. 
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so  that  the  first  group  is  made  up  of  a  strip  on  a  base 
•31808  in  length,  and  has  a  smaller  value  than  the  next 
©roup,  though,  of  course,  any  ordinate  read  off  within  the 
first  oroup  would  be  larger  than  any  ordinate  in  the  second 
croup.     No  adjustment  was  made  to  the  rough  moments. 


Type  III 


180 

160 

140 

120 

100 

80 

60 

40 

20 


turatton     0 


7  8 

F    2 


!-*•■ ■ 
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PROOF. 


In  the  equation  for  the  type,  viz.,  y  =  yJl+-)     e~yx,  put 

ya=p,  and  substitute  z  for  y{a  +  x);  then,  if  N   be  the  total 
frequency, 

=  1    ij0zPa-iJe-z+Py~{P+]    dz  f or  -/  =7 
Jo  cto 


\i4 


zpe~zdz 

0 


=y<y+iITp+1) 

This  o-ives    ?/„  = =45 71  • 

The  nth  moment  about  the  start  of  the  curve  is 

7-r(p+i) 

by  using  the  value  of  N  found  above. 

Since   T(p)={p-l)T{p-l),  the   first   moment   is29  +    ,  the 

second  (P  +  lKP  +  2),  and  the  third  (^H^^A       ^ 

7-  7 

order  to  apply  these  formulae  to  statistical  work,  it  is  necessary 
to  have  moments  about  the  centroid  vertical,  the  position  of 
which  (the  mean)  can  be  found ;  and  as,  by  definition,  the 
first  moment  about  it  is  zero,  we  get 

These  results  give  y  and  p  as  — 2  and  -^-|  —  1  respectively. 

yu-3  /^3 


TYPE    IV. 


y=yo(i+j) 


e-^'la 


FORMULA. 


6(ft-ft-l) 
2^-3/9,-6 

v/{16(r-l)-A(r-  2)2} 
«  =  Vl|v/{16(r-l)-A(r-2)^} 

y°~  aG(r,i/) 


Sk.=  |V^I 


/Tr-2 

+  2 


The  origin  is  not  at  the  mode,  but  is    -   from  the  mean,  i.e., 

va 
origin  =  mean  H — - 


la3(r  —  2) 

mode  =  mean  —  ^  *—)         ' 

2/jL2(r  +  2) 


N     Ire   3r       i2f 


l/o 


N    /  r 


«     V  2-7T       (COS  </>)',+ 


is  a  close  approximation  where  tan  <£=  - 
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NOTES. 


/z3  and  v  have  opposite  signs,  i.e.,  when  /jl3  is  positive 
v  is  negative. 

A  simple  way  to  calculate  the  curve  is  to  put  it  in  the  form 
x=a  tan  0 
y  =  y0  cos  r+20e~l'e 

Then  6  is  taken  as  10°,  20°,  30°,  &c,  and  x  and  y  found  ■  this 

gives  corresponding  values  of  x  and  y.  but  the  values  of  y  will 

not  be  for  equidistant  values  of  x.     In  calculating  evd  the  value 

of   6    must    be    taken    in    circular  measure.      If    equidistant 

ordinates  are  required    to   be    calculated  accurately,  little   is 

gained  by  the   double  form,  and  if  we  had   good  tables    of 

log(l+a?2)  and  tan-1a?,  the  calculation  of  a  particular  ordinate 

would  be  a  very  simple  matter. 

The  calculation  and  meaning  of  Gr(r,  v)  are  dealt  with  in 

the  proof. 

EXAMPLES. 

The  numbers  in  the  following  nearly  symmetrical 
distribution  represent  the  exposed  to  risk  of  sickness  by 
Sutton's  Sickness  Tables  (males — all  durations)  when  the 
number  of  weeks'  sickness  is  represented  by  the  normal 
curve  of  error  (Type  VII.). 


Central  Age. 

No. 
Exposed. 

Graduated  by 

Type  IV. 

5 

10 

6* 

10 

13 

16 

15 

41 

49 

20 

115 

135 

25 

326 

321 

30 

675 

653 

35 

1,113 

1,108 

40 

1,528 

1,535 

-45 

1,692 

1,712 

50 

1,530 

1,522 

55 

1,122 

1,074 

60 

610 

604 

65 

255 

274 

70 

86 

102 

75 

26 

32 

80 

8 

8 

85 

2 

2 

90 

1 

1 

95 

1 

9,154 

9,154 

Tins  group  has  been  taken  as  the  area  of  the  rest  of  the  curve. 
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The  following  values  were  obtained  : — 

Mean=  44-5772339 

fi2=  4-527608 

^3=  -     -705687 

/*4=  64-98048 

0i  =  -0053656 

&=  3-169897 

k=  -0125 

Type  IV.  was  used  because,  as  there  is  a  large  number  of 
eases,  the  probable  error  of  k.  will  be  small  (see  Chapter  VIII.). 

r=     40-12143 

v=       4*450399  (positive  because  ^3is  negative) 
a=     13-39152 
?»=     21-06072 
Sk.=  -     -03313 

When  the  5-years  unit  with  which  we  have  been  working 
is  changed  to  one  year,  a  becomes  66*9576,  and  a2  =  4483*325. 

Ihe  origin      =mean  + 

°  r 

=  52-504394 

The  mode  which  is  wanted  if  the  curve  is  drawn,  is  at 
44-92989. 

As  r  is  large  the  approximate  form  for  y0  was  used, 
,  4-450398  -01Q/8925       . 

9=  40-12143'  or'     g'     n  9  =  lo8"  tan  6    19  11537  »    hence 

log  cos  (£  =  1*9973446,  and  from  this  y0  is  found  to  be 
273*3649. 

The  value  was  checked  by  Dr.  Alice  Lee's  tables  (see 
Appendix  V). 

The  calculation  of  ordinates  by  the  double  process  is  as 
follows  : — 


0        iii  years   4'450398  01oglo<? 
of  age. 

42-1243 log  cos  0       logy 

y 

27337 
251-38 
228-10 

0°              0 

1°       1-1687       T96637  .  .  . 

2°        2-3382        1-93253  .  .  . 

243675 
1-99721           2-40033 
f-98885           2-35813 
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The  second  column  is  formed  directly  from  the  tables  of 
tan  0  by  multiplying  by  a,  and  as  x  is  required  in  years, 
13*39152  x  5  =  66*9576  should  be  used  for  a.  The  fourth 
column  is  formed  by  multiplying  L  cos  6,  and  the  third 
continuously  by  addition.  When  0  is  negative,  the  fourth 
column  has  to  be  subtracted  from  the  fifth  :  i.e.,  it  ceases 
to  be  negative  and  becomes  positive.  In  each  case  the  sixth 
is  formed  from  the  fourth  +  the  fifth  +  log  y0 . 

If  the  calculation  is  made  directly,  the  following  columns 
would  be  required  : — 


(i) 


1  + 


(2) 


^(i%:) 


(3) 


x 

tan-1- 

a 

in  degrees, 

&c. 

(4) 


col  (4) 

in 
circular 

measure 

(5) 


,Co1/5)      ,  ,«xcol(3) 


(«) 


(7) 


+  (6)  +  (7)aug 


(8) 


(9) 


Col.     (2)    can    be    formed     best     by     differences     since 

A(1+X2)  =  2X+1,  tan"1-  has  to  be  found  by  using  a  table 

of  the  tangents  of  angles  inversely.  A  table  helpful  for 
obtaining  col.  (5)  from  col.  (4)  will  be  found  on  pp.  251  to  262 
of  Chambers'  Mathematical  Tables  (1897  edition). 

When  drawing  a  curve  of  this  type  the  position  and 
height  of  the  mode  can  be  noted  and  then  corresponding 
points  inserted,  e.g.,  ty=+l"1687  and  y  =  251*38.  Care  must 
be  taken  to  give  the  curve  its  maximum  at  the  right 
point. 


Type  IV. 


5       10      15     20    25    30l5T40    4-5    50    55    6  0    65     7  0    75    8  0    65    90  '  95 


Mean 
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PROOF. 


In  V  =  yoU  +  °°)i-      e'vtaBrXl  put  tan0=  -  . 


#  =  tan-1  -  and 
a 


Now 


1 1  +  PYl   m  ={1  +  tan20}-™=  (sec20)  -™=cos2w0, 
y=y0  cos2m6e-vB. 

N=|      y0{l  +  ^|      e-'^-^da? 

77 

=  1       ?/0  cos-"l9e~I'fl  —  oxdO.  by  substituting 
J     7T  cos20 

tan0  =      so  that    ,.  =a  sec  -6/= ^ 

a  dd  cos  26/ 

7T 

=  y0a  I        cos  »*0e " v 9dd  where  r  =  2 ra  —  2 

=  y0ae        '       sin  r  <£  e'"^c?  (/) ,     substituting 
J  o 

sine/)  for  cos  0  so  that  (j>  +  ^7r  =  6  and  the  limits  are  changed, 
=yQae-?v«G;(r,v),  say. 
The  nth  moment  about  the  origin  is 
1  f00 

If00  r        rh  ~Wi  ■'■ 

=n.L^\1+«4  e     -& 

7T 

l  ra 

=  ==  y0a'i+1  cos2m_20  tan"0e-"W  by  substituting  as  above 

7T 

= iha~ 1 2    cos'-" 6  &n*6e-«d0 


yoan+1r 

N    L 


cosr+w  +  10  sin^-1^-^ 


7'  — 71+  1 


]  —  — — = p  (sin*~20  cos0e-*(n-l)-  ve~l'e  sin""1*?)    <Z0 
I   r  —  71  +  1  ; 


I-) 

by  integrating  by  parts  and.  treating   smn~10e~v0  as   one  part 
and    cosr~n6    sin    0    as    the    other,   and    remembering    that 

PfK  r—n+1 a 

J  r—n  +  1 

IT 

Wow,  since  cos  >—n  +  l0  sin  n~10e~v0  =  O  when  0  becomes      or 
—  -^ ,  we  have 

rr 

1  ^    )J  -v,  -VCO&r-n+W&mn-ide-'Odd 


a 


r—n  +  1 
Further, 

7T 


|(n—  1)<x/a'w_2— z>/u'»_i  j 


//0a- 1 


cos^tan^e-^W 


=  lltr\-\"      vcoar0e-v9d0[hy  putting    n=l    in   the 

Nr[      1     ,  J 

above  equation  for  /xfn 

=  —    ,  because  N  =  y0a  I       cos  }'0a~ved0 
r  J     7T 


Using  the  last  result  with  the  formula  for  the  »th  in 
terms  of  the  two  previous  moments,  and  remembering  that 
fi'0  is  unity, 

'ii=~r(r-l)(r-2j(8r-2  +  ^ 

^■r(r-i)(r-gXr-8)<8,(r"g)+^flr"^  +  *,i 

Kef  erring  these  moments  to  the  centroid  vertical,  we 
have,  by  putting  d=p,\  = in  the  formula?  on  p.  19, 


76 
or 


/^,,(r_l)(>-2+-) 


fit- 


~      r3(r-l)(r-2) 

3a4(r2  +  z;2)  { (r  +  6)  (?-a+  v2)  -8r2} 


r4(r-l)(r-2)(r-3) 
If  now,  we  put  2  for  >,2  +  z/2,  and  write  as  before, 


we  have, 


and 


A-&  and  A=£, 

-  2{,-l)         z      8 

A(r-2)(r-3)  8r 

8(r-l)    "-r  +  0     7' 

Adding  and  dividing  out  by  r— 2,  we  have 

6(13,-/3,-1) 
'-  2/3,-3/3,  -6' 


and 


A(r-2)« 


16(r-l) 


Finally,  since  v-  —  ?:—r2,  the  other  formula?    on  p.  69  follow 
at  once. 

Since  the  tangent  at  the  top   of  the  maximum  ordinate  is 
parallel  to  the  axis  of  x,  the  position  of  the  mode  is  such  that 

V    is  zero  at  that  point,  i.e., 
dx 

J  {        a?)  L        ^        a  J 

is  zero.    There  are  three  cases,  x  =  —  x  ,  x  =  -fco ,  and  a  value 

of  x  such  that  — =-  +  -  is  zero,  or  x=  —  =-  .     The  distance  of 
a2        a  2  m 

the  mean  from  the  origin  is  /x\  or  —    -,  and,  therefore,  the 

2va 

distance  between   the  mean   and    mode    is -. — —^.  which 

r(r  +  2)J 
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reduces  to  the  expression  given  on  p.  69,  when  the  values  for 
v  and  a,  on  the  same  page,  are  inserted. 

It  will  be  useful  to  give  another  example  of  the  calculation 
of  y0  for  curves  of  this  type,  and  may  take  a  curve  in  which 
r  =  29'590,  i/  =  19'886,  a  =  13-650,  N  =  2162.    .*.  tan  0  = -67205, 

.-.     (/)  =  33°  54' Jj,  cos  c/>  = -82998,  log  cos  </>  =  T'91907,  and  0  in 

circular  measure  is  "59172. 


logN 
colog  a 
i  log  r 


=  3-33486 
=  2-86486 
=   -73557 


cos2</>_ 
~37~  ~ 


12r 


•00776 


-     -00282 
-cf>v=- 11-76700 


log 


x/2 


IT 


-11-762      xloff10€ 


=  1-60091 


6-89183 


colog  (cos  </>)''+1  =  2-47564 
1-90367 


y0=   -80107 

The  form  just  considered  is  sufficiently  accurate  for  all 
practical  purposes  provided  v  is  not  very  small.  If,  however, 
v  is  less  than  2,  Q{r,v)  should  be  calculated  by 


2i/7re-W>+l) 


n=»       f  r2-\-v2 

Product  (1  + 


4<1+l) 


TYPE     V. 


y=yQx  pe~y,x 


FORMULA. 


y={p-2)y/'/jLi{p-Z) 


l/o- 


V 


Origin  =  Mean ^~— 

°  p  —  2 


Mode  =  Mean  — 


27 


p(p-2) 

The  sign  of  7  is  the  same  as  that  of  yu,3, 
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EXAMPLE. 


The  following  series  of  deaths  is  taken  from  Mr.  King's 
paper  "  On  the  rate  of  Mortality  amongst  Female  Nominees, 
&c."  (Journal  of  the  Institute  of  Actuaries,  xxxiii.,  pp.  262-8)  : 


Ages. 

Deaths. 

Graduated  by 
Type  V. 

30-34 
35-39 
40-44 
45-49 
50-54 
55-59 
60-64 
65-69 
70-74 
75-79 
80-84 
85-89 
90-94 
95-99 
100,  &c. 

1 

5 

8 

12 

28 

82 

128 

253 

342 

525 

438 

265 

53 

18 

4 

1 

3 

6 

14 

32 

68 

137 

247 

381 

480 

441 

261 

80 

10 

1 

2,162 

2,162 

The  mean  is  at  age  75*9782605,  and  the  moments  (adjusted), 
&c.,  are 

/Jh=       3-573346 

fi3=-  4-752613 

fj,4=     51-02583 

£1=         -4950399 

R,=        3-996134 


/c  = 
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Strictly  speakings  Type  IV.  should  be  used,  but  the  value  is 
not  very  far  from  unity,  and  the  following  Type  V.  constants 
were  found  : 

p=       37-29145 

7=  —390*6609  (negative,  because  /x3  is) 
\ogy0=       56-930518 


The  approximation  to  the  value  of  log  T(j9  —  1)  was  used, 
origin  is  at  age  131*32606,  and  the  mode  at  78-9467, 


The 
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The  columns  used  for  calculating  the  ordinates  were 


X 

(I) 

log.r 

—p  log  X 
(3) 

-(-• ylogioe) 

(4) 

log  y 

=  log^0  +  (3)  +  (4) 
(5) 

y  =  antilog  (3) 
(6) 

Col.  (4)  is  best  formed  by  putting  ylog^e  on  the  plate  of 
the  arithmometer,  and  multiplying  it  by  -  ,  obtained,  of  course, 

from  a  table  of  reciprocals. 

The  point  to  be  borne  in  mind  in  drawing  a  curve  of  this 
type  is  that  as  the  mode  and  origin  are  not  at  the  same  place, 
care  must  be  taken  to  give  the  maximum  ordinate  its  right 
position  and  magnitude  (cf.  Type  IV.). 

The  graduated  figures  agree  fairly  closely  with  the 
original  statistics  below  the  90-94  group,  but  are  unsuitable 
for  that  and  the  two  later  groups.  The  reason  is  that  Type  IV. 
should  be  used,  and  curves  of  Type  V.  have  a  range  limited 
in  one  direction,  while  Type  IV.  curves  have  an  unlimited 
range.  The  particular  case  was  chosen  partly  because  an 
example  in  which  /^  is  negative  is  rather  more  awkward  than 
when  ^  is  positive.  In  such  cases  it  is  a  good  check  to 
imagine  the  statistics  written  in  inverse  order  (in  this  case 
4,  18,  53,  &c),  and  so  avoid  the  negative  signs. 
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Type  V 


30     35     40     45     50     55      60     65      70     15     30     85      90     95     100    105 
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PROOF. 

Putting     L=z    in    y  =  y0e~y'\c~i\    and    integrating    from 
0  to  x  ,  we  have 

2sr7p-l 

yo=r(P-T) 

Using  the  same  substitution,  the  nth  moment  about    the 
orio-in  is 


™  Jo 


r(p-n-l) 

1    r(p-i) 


This  gives      ii\  =  —*—r. 
p  —  l 


which  is  the  distance  between  the  mean  and.  origin, 

72 
**- (p-2)(p-3) 

^3      (j,-2)0>-3)(p-4). 
Transferring  the  moments  to  the  centroid  vertical 


and 


{M2  = 


/J>3  = 


(P-2y(P-3) 

473 


(p-2}»(p-3)(1i-4) 


=  ^=16(y-3)_     16  16 

Pl      ^    ■    (^-4)2       ^_4+  (p_4)2 

.        .        16.        .       16      „ 

p  —  4  will  have  to  be  taken  as  the  positive  root  of  the 
equation,  or  7,  which  from  the  above  equations  is  given  by 
(p  —  2)\Zfj,2(2')  —  3),  will  be  imaginary. 

Since  the  tangent  to  the  curve  at  the  top  of  the  maximum 
ordinate  is  parallel  to  the  axis  of  x,  the  position  of  the  mode  is 

such  that  -/  is  zero  there,  i.e..  yQpe~P~1e~yfa  {  —p+-  -  is  zero. 

civ  J  \      s       x) 

x  =  0  and  a?=oo  give  the  cases  in  which  the  curve  touches  the 
axis    of    x,  and    the    other  case,  the    one    required,  is   when 

p—  -  =  0,    or    a?=  -,  i.e..  the  mode  is    -  from  the  origin. 
1       x  P  p  5 
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TYPE    VI 


y=yo{x-a)l,'x 


FOEMULyE. 


6(ft-ft-l) 

6  +  3/3,-2/32 


1  r       rr+2 


+  2)2+16(r+l) 


2  2       \A(r+2)«+16(r+l] 

=  2  V/^V/A(r  +  2)2+16(r+l)_ 


a 


^"r^-^-ijrfe+i) 


sk-4v<-±l 


Origin  =  Mean  —  -— ^ ^ 

31-2.-2 

Mode  =  Means-  ^  •      • 

2   ^  r  +  2 


rs     •> 


84 
NOTES. 

The  range  is  from  a  to  co ,  and  the  method  is  like  that  of 
Type  I.  r  and  e  are  found  exactly  as  in  Type  I.,  and  l—qi 
and  1  +  ^2  are  the  roots  of  z2— rz-^e  —  O,  just  as  l+??i1and 
1  +  ra2  were  in  Type  I.  The  origin  is  before  the  beginning  of 
the  curve.  1  —  g^  is  taken  with  the  negative  root  and  l+q2 
with  the  positive  root  when  //,3  is  negative,  and  vice  versa. 

EXAMPLE. 


The  number  of  entrants  in  the  recent  limited  payment 
policies  experience  were  summed  in  groups  of  ten  years  of 
age  and  divided  by  100,  and  the  following  series  was 
obtained — 


Xo.  of  Entrants 

Graduated  by  Type  VI. 

-100. 

curve. 

rH 

1 

56 

50 

167 

168 

98 

100 

34 

36 

9 

10 

2 

2 

1 

'5 

368 

368 

! 

lie  moments 

&c.j  were — 

mean 

at  -402174  after  the  centre  of  167  group 

/*2  = 

•928835 

fia= 

•893096 

f*>4  = 

4-088800 

A= 

•9953605 

&= 

4-739349 

K2  = 

1-895 

r  —  — 

33-42429 

1-2,=  - 

41-03080 

l  +  q*= 

7*60950 

9.i  = 

42-03080 

q,= 

6-60950 

a  = 

10-37947 

logy0= 

46-1821 
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The  origin  is  12-74270  before  the  mean  or  12"34058 
before  the  centre  of  the  167  group,  and  the  curve  starts  at 
12'34053-10-37947  =  1-96106  before  the  centre  of  the  largest 
group.  This  makes  the  start  of  the  curve  at  about  age  10, 
which  is  reasonable. 

The  curve  was  calculated  as  follows — 


(1) 

log-  X 

(•2) 

log(x-a) 

(3) 

-  <l\  ]og  -'-' 
(4) 

q  log(x  —  a) 
(5) 

logy 
(6) 

y 

(7) 

. 

There  is  no  difficulty  in  writing  down  the  values  for 
columns  (2)  and  (3)  without  using  column  (1),  as  only  the 
whole  numbers  in  x  and  x—a  change,  the  decimal  remaining 
constant  so  long  as  equidistant  ordinates  are  required. 
Columns  (4)  and  (5)  are  obtained  directly,  and  column  (6)  by 
adding  columns  (4)  and  (5)  to  log  y0. 

The  mode  which  is  useful  for  drawing  the  curve  is  '02429 
before  the  centre  of  the  largest  group. 

The  skewness  is  *443. 

PROOF. 


N==     y0(x— a)^x~^dx 

by  substituting       for 

Jo 

N 

^"nlh-fc+iB.fe+lj  qi-q-2-l) 

iXq,+i)r(qi-q2-iy 


86 


The  nth  moment  about  the  origin  is 

1  fM 

P  n  ~  ]$  J   y°xU  (x  ~  a)  ^x-^dx 

__  y0  rfa]—  g-2-n  —  iyT(g2H-l) 

by  the  same  substitution  as  that  used  above. 

From  this  last  result  we  obtain,  by  inserting  the  value  of  y0, 
and  remembering  the  relationship  between  T(qi)  and  r(ql  —  l), 
&c.; 


,_       a'(gi-l)(gi-2) 


/A  2  = 


&c. 


(«i-9.-2)(a,-«t-8; 


It  will  be  noticed  that  these  equations  are  the  same  as 
those  already  obtained  for  Type  I.  if  ml=—ql  and  m2  =  q.2. 
Thus,  we  can  use  the  whole  of  the  Type  I.  solution,  provided 
we  bear  in  mind  that  the  range  is  from  x  =  a  to  a?=oo  . 

Type   VI. 


1— V 
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TYPE    VII. 

NORMAL   CURVE   OF   ERROR." 

FORMULAE. 


c=2/*2 


N 
2/o  = 


v/2 


7T/X2 
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EXAMPLES. 


The  following  table  gives,  in  column  (2),  the  sums  assured 
and  bonuses,  and  in  column  (4)  the  reserves  resulting  from 
grouping  a  number  of  Endowment  Assurances  according  to 
their  office  years  of  birth  : — 


Central  Age  for 

Sum  Assured  ani 

Bonuses -^1,0C0. 

Reserves  -^  1,000. 

5  groups  of 
years  of  birth. 

Ungraduated. 

Graduated. 

Ungraduated. 

Graduated. 

0) 

(2) 

(3) 

(4) 

(5) 
•6 

17 

11 

13 

■e 

22 

48 

40 

2'8 

2-7 

27 

124 

104 

11-5 

10-9 

32 

213 

202 

277 

30-1 

37 

281 

282 

591 

58-4 

42 

295 

288 

847 

80-1 

47 

185 

214 

741 

76-9 

52 

104 

116 

50-5 

522 

57 

40 

44 

232 

250 

62 

15 

13 

12-2 

8-4 

67 

3 

3 

1-3 

2-4 
3477 

Total 

1,319 

1,319 

347-7 

The  following  table  shows  the  moments  and  constants 


Constant. 

Sinn  Assured  and  Bonus. 

Reserves. 

Mean  age 

39-202426 

43-967213 

J*2 

3  066840 

2-769635 

M-s 

•650127 

•029805 

M4 

27-02516 

22-40663 

0i 

•014653 

•0000418 

& 

2-873346 

2-920997 

K 

-005 

-  -0002 

cr(  =  V~) 

1-751237 

1-664222 

(T-1 

•5710248 

•6008813 

!h 

300-4760 

83-34959 

The  criteria  for  the  normal 'curve  are  k=0,  ft  =  0,  and 
ft  =  3.  The  values  given  above  do  not  differ  very  greatly 
from  these,  but  a  comparison  of  the  graduated  and 
ungraduated  figures  shows  that  the  reserve  curve  agrees 
better  than  the  sum  assured  curve ;  partly  because  the 
value  of  ft  is  closer  to  3,  and  ft  has  a  larger  value  in 
the  case  of  the  sum  assured. 
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For  the  calculation  of  y0  the  value  of 


log 


=  1-6009100657 


y/2/i 

is  required. 

In  finding  the  areas  for  the  comparison  between  the 
graduated  and  ungraduated  figures  it  is  unnecessary  to 
calculate  the  ordinates,  as  one  of  the  calculated  tables  of  the 
probability  integral  can  be  used.  The  best  table  was  recently 
given  by  Mr.  W.  F.  Sheppard  (Biometrika,  vol.  ii.,  pp.  174,  &c), 
and  the  columns  in  the  following  table  show  how  it  was  used 
to  calculate  the  areas  in  one  of  the 
Mr.  Sheppard' s  tables  give  the  areas 
normal  curve  in  terms  of  the  standard 
assumes  the  standard  deviation  to  be 
must  be  entered  by  using  intervals  of  a' 


cases  (the  reserves). 
and  ordinates  of  the 
deviation  ;  that  is,  he 
unity,  and  his  tables 


Di. stance  from 

Values  of  « 

origin  in 

Previous 

from  Sheppard's 

Difference  of 

Area  multi- 

Age 

calculation 

column 

Tables  using 

previous  column 

plied  bv  3-47-7 

units,  i.e., 

xo-1. 

differences  (area 

—  area  for  age 

(total 

5  years  of  age. 

from  origin  to  x ). 

group  x  to.r+5. 

frequency) 

145 

5-893443 

3-541258 

00144* 

•6 

19-5 

4-893443 

2-940377 

•99836 

•00785 

2-7 

24-5 

3-893443 

2-339496 

•99049 

•03152 

10-9 

29'5 

2-893443 

1-738615 

•95S97 

•08659 

30-1 

34-5 

1-893443 

1-137734 

•87238 

•16806 

58-4 

39-5 

•893443 

•536853 

•70432 

•22985+ 

80-1 

44-5 

•106557 

•064028 

•52553 

•22141 

76-9 

495 

1-106557 

•664909 

•74694 

•15018 

52-2 

54o 

2-106557 

1-265790 

•89712 

•07190 

25-0 

59o 

3-106557 

1-866671 

•96902 

•02418 

8-4 

64-5 

4-106557 

2-467552 

•99320 

•00572 

2-0 

69-5 

5-106557 

3-068413 

•99892 

•00108* 

•4 

*  Remainders  of  areas  beyond  19'5  and  69*5. 

+  ('70432  — *50000)  +  ("  52553  — "50000)  because  we   pass  across  the  origin, 
and  a  piece  of  the  group  is  on  each  side  of  it. 


The  second  column  can  be  left  out  when  the  method  has 
been  grasped.  The  ages  in  the  first  column  were  taken 
consistently  with  the  assumptions  that  17,  22,  etc.,  were  the 
central  ages  of  the  groups. 

If  ordinates  are  required,  the  z  column  in  Mr.  Sheppard's 
tables  must  be  used.  It  was  with  its  help  that  the  curves  in 
the  figure  were  drawn.  The  statistics  and  curve  for  the 
reserves  are  shown  by  the  dotted  lines. 
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Type  VI 


Sums. 
Ass  urea" 
-r  /OOO. 


/Jge     17       22 


An  average  reserve  for  any  group  can  be  obtained  by 
means  of  the  graduated  figures,  and  it  could  be  used  to  test 
the  reserves  obtained  at  any  future  valuation.  This  is  by  no 
means  the  only  rough  check  that  can  be  applied,  but  it  is 
interesting  because  it  shows  a  use  to  which  frequency-curves 
might  be  put  in  practical  office  routine. 
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PROOF. 


To  show  that 


f  °° 

I   e~x  dx= 

Jo 


V7T 

2 


let 


Jo 
then,  substituting  ax  for  a?,  we  have 

e~a*x2ada:=K 


Hence, 


But 


e-a^1+x^adadx=A  e-a2da=K2 

f  °°  ]        1 

If"  (to      ., 

2J0   1+^-"" 

V7T 


4 


Hence, 


J -co  "  v 

The  other  constant  is  obtained  as  follows  : — 

t/0e~  c  dx=yQ\  xe~  '-■  +   —e~  <■  xdx        by  parts 

J  —  cc  L  C  J  _co 


_2» 


1-f 


C&t' 


AT  2^ 


C  =  2/^a 
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ADDITIONAL   EXAMPLES. 

8.  Up  tu  the  present  we  have  merely  considered  examples 
with  a  view  to  illustrating  the  various  types  of  frequency- 
carves,  but  it  seems  advisable  to  consider  one  or  two  practical 
examples  which  may  help  to  show  the  range  of  applicability 
of  the  curves  in  actuarial  work,  and  give  an  opportunity  of 
noticing  a  few  difficulties  which  may  arise  in  applying  them. 

The  function  with  which  actuaries  generally  wish  to  deal 
in  practical  work  is  not  an  exposed  to  risk  or  series  of  deaths 
or  withdrawals,  but  the  ratio  between  the  deaths  and  the 
exposed;  that  is,  with  the  rates  of  mortality,  sickness, 
marriage,  and  withdrawal.  An  actuary  studying  frequency- 
curves  may  therefore  naturally  ask  whether  any  of  these 
rates  can  be  graduated  by  means  of  the  curves  we  have 
examined,  and,  if  they  fail,  must  they  be  put  aside  for  some 
other  method  ?  Xow  the  first  point  to  be  considered  is 
whether  these  rates  are  frequency  distributions  ;  if  they  are 
not,  the  use  of  the  frequency-curve  is  empirical.  A  rate  of 
mortality  gives  the  proportion  of  people  at  each  age  who  die, 
and  if  we  imagine  1,000  persons  exposed  to  risk  at  each 
integral  age,  the  number  of  deaths  would  be  1,000  times  the 
rate  of  mortality,  and  this  seems  to  show  that  it  is  possible  to 
consider  the  rate  uf  mortality  as  a  distribution,  though  it  is 
•  me  that  could  hardly  arise  in  actual  experience.  It  is 
impossible  to  describe  the  rates  of  mortality  or  sickness  by  a 
single  frequency-curve.  On  the  other  hand,  the  rates  of 
marriage  are  certainly  much  like  frequency-curves,  and  the 
rates  of  withdrawal,  whether  regarded  according  to  age  or 
duration,  might  take  a  form  like  our  example  in  Type  III. 
There  are,  however,  practical  objections  to  the  direct  operation 
on  rates,  even  apart  from  the  very  exaggerated  idea  of 
frequency  distributions  in  which  it  is  necessary  to  indulge. 
The  numbers  exposed  to  risk  at  the  end  of  any  table  become 
small,  and  a  single  death  or  marriage  there  gives  a  very  large 
rate,  while  at  several  ages  near  there  may  be  a  zero  rate 
shown  by  the  ungraduated  data.  This  is  extremely  awkward, 
as  it  tends  to-  make  the  ratios  dealt  with  far  rougher  in 
application  than  the  actual  observations  are  in  fact,  and  we  are 
forced  to  group  the  material  before  using  it,  which  introduces 
an   arbitrary   practice    which    it    is    well   to   avoid   as   far   as 
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possible.  It  must  not,  of  course,  be  inferred  that  a  small 
number  of  say  fifty  or  one  hundred  deaths  must  necessarily 
be  grouped  according  to  each  year  of  age,  but  that 
even  if  there  are  two  or  three  thousand  the  roughnesses 
introduced  by  the  use  of  rates  influence  the  result 
considerably.  The  reason  is  that  an  equal  weight  is  given  to 
each  rate  of  mortality  which  is  very  far  from  the  weight 
indicated  by  the  exposed  to  risk. 

9.  It  will  be  useful  to  consider  a  case  bearing  out  these 
objections  and  then  deal  with  a  practical  method  of  over- 
coming them.  The  statistics  to  be  considered  have  been 
taken  from  a  paper  by  Mr.  M.  Mackenzie  Lees  "  On  Rates  of 
Mortality  and  Marriage  among  daughters  of  Peers  and  Heirs 
Apparent,  &c."  (Transactions  of  the  Faculty  of  Actuaries, 
vol.  i.,  p.  276),  and  may  be  summarized  as  on  page  94. 

The  moments  were  calculated  by  Mr.  G.  F.  Hardy's 
Summation  Method,  and  were  found,  about  the  mean 
28-77191,  to  be 

^  =        63-2092 

fM3=      627-101 

^4=19,103-3 

ft=  1-557153 

£2=  4-781321 

The  criterion  was  k=—  I'd,  but  as  I  had  neglected  the 
rate  -00089  at  71  in  calculating  the  moments,  I  used  Type  III. 
The  inclusion  of  the  rate  at  that  age  would  have  lengthened 
the  curve  and  considerably  increased  the  arithmetical  value 
of  the  criterion. 

The  constants  for  Type  III.  were 

7=      -201592 
p=   1-56881 
a=   7-78189 
Mode  =  2381128 

The  curve  starts,  therefore,  at  age  16*02939. 

y0=  890-05. 

The  rates  resulting  from  this  graduation  are  given  in  the 
table,  and  while  they  tend  to  show  that  the  distribution 
of  rates  of  marriage  is  closely  allied  to  a  frequency-curve, 
they     do     not     give     a    satisfactory     graduation,     and     the 
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Marriage  Rates  of  Spinsters. 


Rate  of 

1 

Age 

Exposed 

Xo.  of 

Rate  of 

Marriage 
Graduated 

by 
Frequency 

Hypothetica 

Xo.  of 

Rate  of 

to  Risk 

Marriages 

Marriage 

Exposed 

Marriages 

Marriage 

X 

E 

M.x 

mx 

E'x 

M> 

j  Graduated. 

Curve. 

15 

3,658 

3 

•0008 

... 

3,695 

3 

16 

3,603 

8 

•0022 

•0027 

3,433 

7 

•0018 

17 

3,528-5 

49 

•0139 

•0132 

3,187 

44 

•0157 

18 

3,393-5 

114 

0336 

0332 

2,957 

99 

•0350 

19 

3,187 

176 

•0552 

•0517 

2,742 

151 

0541 

20 

2,945 

219 

•0744 

•0667 

2,541 

189 

•0695 

21 

2,688-5 

192 

•0714 

•0776 

2,354 

168 

•0809 

22 

2,443 

211 

•0864 

•0846 

2,179 

188 

•0880 

23 

2,187 

212 

•0969 

•0881 

2,016 

196 

•0917 

24 

1,956 

194 

•0992 

•0S89 

1,861 

185 

■0920 

25 

1,758 

146 

•0831 

•0875 

1,723 

143 

•0901 

26 

1,583-5 

137 

•0865 

•0845 

1,591 

138 

•0861 

27 

1 ,417 

121 

•0854 

•0803 

1,469 

126 

•0812 

28 

1,270-5 

105 

•0826 

•0753 

1,355 

112 

•0754 

29 

1,148-5 

75 

•0653 

•0698 

1,249 

82 

•0693 

30 

1,068 

60 

•0562 

•0640 

1,151 

65 

•0631 

31 

984 

64 

•0650 

•0583 

1,061 

69 

•0569 

32 

9045 

41 

•0453 

•0528 

976 

43 

•0508 

33 

848-5 

30 

•0354 

•0475 

897 

32 

•0452 

34 

802 

39 

•0486 

•0425 

825 

40 

•0400 

35 

752 

20 

•0266 

•0378 

758 

20 

•0352 

36 

711 

25 

•0352 

•0335 

696 

25 

•0309 

37 

672-5 

18 

•0268 

•0295 

639 

17 

•0270 

38 

638 

H 

•0172 

•0260 

586 

10 

•0235 

39 

6125 

14 

•0229 

•0228 

537 

12 

•0205 

40 

586-5 

15 

•0256 

•0199 

492 

12 

•0176 

41 

568-5 

9 

•01 58 

•0173 

451 

7 

•0151 

42 

541-5 

6 

•0111 

•0151 

412 

5 

■0130 

43 

515 

8 

•0155 

•0131 

376 

6 

•0112 

44 

491-5 

2 

•0041 

•0113 

345 

1 

•0096 

45 

476 

5 

•0105 

•0097 

315 

3 

•0082 

46 

454 

5 

•0110 

•0084 

288 

3 

•0070 

47 

440-5 

2 

■0015 

•0072 

262 

1 

•0060 

48 

416 

5 

•0120 

•0062 

239 

3 

•0051 

49 

395 

2 

•0051 

•0054 

218 

1 

•0044 

50 

378-5 

'" 

•0046 

199 

•0037 

51 

363-5 

.*:: 

•0039 

181 

•0031 

52 

348-5 

1 

•0029 

•0034 

165 

•0026 

53 

335-5 

2 

•0089 

•0029 

150 

1 

•0022 

54 

3175 

•0024 

139 

•0019 

55 

304 

•0020 

124 

•0016 

56 

291 

"3 

•0103 

•0018 

1 

57 

278-5 

1 

•0036 

•0015 

58 

261 

•0013 

59 

248-5 

.'.'.' 

•0011 

60 

234-5 

...- 

•0009 

75*7 

•0007 

61 

2195 

1 

•0046 

•0007 

62 

209-5 

... 

•0006 

63 

201-5 

... 

•0005 

64 

191 

•0004 

65 

177 

::: 

•0004 

45-7 

•0003 

66 

165-5 

... 

•0003 

67 

154 

... 

•0002 

68 

147-5 

•0002 

69 

135-5 

•ouoi 

... 

70 

124-5 

•0001 

27-2 

•ocoi 

71 

112-5 

1 

•0089 

... 

... 

72 

105-5 

... 

... 

73 

95 

... 

74 

84-5 

75 

79 

... 
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failure  is  due  almost  entirely  to  the  objections  referred 
to  above.  Of  course,  if  we  were  examining  the  algebraic 
form  taken  by  rates  of  marriage,  we  should  begin  by  work  on 
population  data  where  the  roughness  of  material  is  avoided 
by  the  large  numbers  of  individuals  dealt  with  ;  as,  however, 
we  are  seeking  for  a  graduation,  we  must  see  how  these 
objections,  which  of  course  apply  to  some  extent  to  any 
method  of  graduation,  can  be  overcome.  It  has  been  remarked 
that  the  cause  of  the  difficulty  is  that  incorrect  weights 
are  given  to  the  items  used,  and  the  most  obvious  suggestion 
is  that  the  actual  exposed  and  marriages  should  be  graduated 
separately.  This,  however,  entails  a  large  amount  of 
additional  work,  and  a  shorter  method  can  be  used  which 
avoids  the  double  graduation.  This  method  consists  of  using 
a  series  allied  to  the  exposed,  and  treating  it  as  a  hypothetical 
exposed  to  risk  from  Avhich  a  new  series  of  marriages  can  be 
calculated.  The  advantages  are  that  we  have  only  to  make 
one  graduation,  and  the  weights  of  the  various  parts  of 
the  table  are  given  approximately.  In  a  similar  way  qd.  can 
be  graduated,  and  in  this  connection  it  may  be  remarked  that 
as  the  exposed  to  risk  is  generally  capable  of  being- 
represented  by  a  frequency-curve,  it  is  natural  to  suggest  that 
the  hypothetical  exposed  might  be  taken  as  the  simplest  form 
assumed  by  such  curves  (viz.,  Type  VII.);  this  is  also 
convenient  because  the  ordinates  for  such  curves  have  been 
tabulated. 

10.  The  hypothetical  exposed  can  be  fixed  by  trial  or 
from  the  values  of  the  exposed.  The  column  E'a,  in  the 
table  given  above  is  taken  from  Sheppard's  Tables  of  the 
Probability  Integral,  x  being  taken  as  3'06,  3'084,  3*108, 
3*132,  &c,  and  the  entries  were  multiplied  by  10". 
M'a.=E'a.xma,  was  then  formed  and  graduated.  The 
following  values  were  obtained  for  the  M'.r  series  : — 
Mean=      24-85779 

/Lt2=       29-5006 

M3=    190-112 

M4  =  4;36M2 

01=        1-40775 

&=        5-01114 
k=  -   7-102 
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As  this  is  large.  Type  III.  was  used,  and 

7=        -310350 
p=     1-841405 
a=     5-933325 
7/o=192-625 
Mode=   21-63562 
The  curve  was  then  worked  out  and  the  rates  of  marriage 
in  the  final  column  were  obtained  by  dividing  M'  by  E'.     They 
agree  closely  with  the  ungraduated  figures. 
11.  A  numerical  example  of  the  application  of  the  method  to 
the  0MX"""  Table  may  now  be  given.     The  normal  curve  with 
<r  =  10   and  origin   at  age  524  was  used,  and  the  values  were 
multiplied  b}^  qx  with  the  help  of  Crelle's  tables. 
A  part  of  the  work  was 


gxEx  105 

Age. 

Ordinate  from 

Skeppard's  Tables 

=  E 

Age.            gxEx  lO3 

810 
597 
644 

52 
51 

50 

•3984439 
•3944793 
•3866681 

53  801 

54  850 

55  875 

1 

&c. 


Summing  these  entries  (q  x  E  x  105)  in  fives,  I  formed  the 


following : — 


Age 

q  X  E  X  10r> 

20 

13 

25 

70 

30 

218 

35 

594 

40 

1,394 

45 

2,460 

50 

3,702 

55 

4,519 

60 

4,385 

65 

3,602 

70 

2,249 

75 

1,197 

80 

461 

85 

133 

90 

31 

95 

5 

100 

1 

25,034 

The  abbreviations  (use  of  Crelle's  tables  and  grouping) 
were  adopted  to  save  labour,  and  as  the  figures  were  required 
for  an  example  they  are  sufficiently  accurate. 


The  following  values  were  then  found  :  — 
Mean  Age  =  59-439762 
/a2=   4-584327 
^=-•4999871 
^4=61-17014 
Type  of  curve. — No.  I. 
mx  =  32-81 166 
^,  =  26-57123 
«!  =  18-78553 
02=15-21272 
?/0  =  4609-884 
Mode  age  =  59-730789 
(The  unit  is  5  years  of  age.) 

The  ordinates  were  then  calculated  for  every  fifth  age,  and 
finding  that  the  curve  is  not  very  far  removed  from  the  normal 
curve  of  error,  I  interpolated  in  the  second  differences  of  the 
logarithms  of  the  ordinates  for  those  at  the  other  ages.*  A 
quadrature  formula  was  used  for  finding  areas,  and  qx  was 
found  by  dividing  by  the  hypothetical  figure  already  used  for 
the  exposed. 

The  expected  deaths  were  as  follows  : — 


G 

raduated 

q* 

for 

Deviation. 

Group. 

Actual. 

Expected. 

Ce 

utral  Age 

0 

f  group. 

+ 

15-19 

... 

1-5 

1-5 

20- 

00643 

9 

8-9 

•1 

25- 

00731 

69 

61-0 

8-0 

30- 

00850 

205 

204-6 

•4 

35- 

00991 

369 

3807 

11-7 

40- 

01179 

588 

575*6 

12-4 

45- 

01452 

801 

811-4 

104 

50- 

01866 

1,064 

1063-8 

"•2 

55- 

02505 

1,399 

1386-6 

124 

60- 

03516 

1,752 

1773-2 

21-2 

65- 

05118 

2,164 

2136-7 

27-3 

70- 

07682 

2,216 

2261-2 

45-2 

75- 

11648 

1,965 

1925-8 

392 

80- 

17462 

1/237 

1241-9 

4-9 

... 

I     85- 

24870 

494 

514-4 

20-4 

90- 

33286 

129 

126-0 

30 

95- 

43289 

18 

173 

•7 

100- 

1 

1-5 

'5 

14,480 

14492-1 

1158 

103-7 

21 

9-5 

*  Ase-'*-^2-0'2  is  the  equation  to  normal  curve,  the  logarithm  is  \x'~  +Bx  +  C, 
say.     The  criterion  of  course  shows  if  the  curve  is  nearly  normal. 
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12.  It  will  be  interesting  to  examine  a  particular  case  of  the 
method  just  described,  as  it  is  often  required  by  actuaries. 

Defining  Makeham's  hypothesis  as  colog  px  =  A-\-~Bcv,  we 
take  a  normal  curve  (y0e~(x~hr'>2<T~)  to  represent  the  exposed 
and  multiply  by  the  values  of  colog  px.  This  means  that 
we  assume  that  the  products  can    be    represented    by 

y  =  (A+  Be*)  y^-fr-WI2"' 

=  Aytf-te-WI2**  +  HBi/0r(:':"2[U(r=logff']'v+[U(r21o°fC]2)'2(7: 
where  H  =  e(-1' +'2<T'h'[os<c+ o-4(iog^)2 -h-)i-2a-  _  ek\ogec+  ^(logeO- 

y  =  Ajytf-(?-WP<r9-+HBytf-<-x-t)*'2<r*- I. 

i.e.,  the  sum  of  two  normal  curves  both  having  the  same 
standard  deviation  as  the  exposed  curve  and  one  having  the 
same  origin. 

The    difference    between    the    two    origins    gives    o-2logfc, 

so  log"10e=  — —  log10e. 

The  whole  solution  is  made  very  simple  by  taking  moments 

J'+x  r+oo 

xydx  and  x2ydx  (the 

first  two  moments)  give  {t— &)N2*  and  N1o"2+N2(<r!+  [t— h)2)^ 
where  ^Sl  =  A?j0  and  N2=HBy0- 

Dividing  the  values  just  given  by  Ni  +  Na  (the  total 
frequency),  we  obtain,  as  the  first  moment,  about  the  known 

origin    — =-=r-.  and.  as  the  second, 


fi'2—03 


or  t  —  h  = 

and  ~  =/,(N,  +  N2) 

£  —  11 

where  //  is  written  for  moments  about  h 

0  Remember  that  tlie  normal  curve  is  symmetrical,  so  that  the  odd  moments 
about  the  mean  of  such  a  curve  are  zero. 

t  Can  be  seen  at  once  as  the  sum  of  two  integrals;  Njtr2  gives  the  second 
moment  of  the  first  normal  curve  in  I,  and  N2(<r2  +(t  —  h)'2)  gives  the  second 
moment  of  the  second  normal  curve. 
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logi0c=  ~  -2-log10e II. 


107"' 
as  stated  above,  and   if   y=  — y==  as  is,  of  course,  generally 

a  v'lrr 


com 

renientj 

then, 

A 

=Ni-*-10* 

and  B 

=N2-=-(10* 

N2 

xH) 
1 

"  10*  j** 

JC+^(l0geC)s 

N2 

10Ve-lF 

*>10grf 

N2 

=               t  +  h 
10kC-2~ 

(see  equation  II.) 


13.  If  we  assume,  as  Mr.  Hardy  does  in  his  recent  graduations 
of  the  new  experience,  that  log10c  is  known,  we  only  require 

to  calculate  one  moment  which  gives  us   -^ — r^r  ,  and.  this, 

JN !  -f  JN2 

with  the  help  of  equation  II.,  enables  us  to  complete 
the  solution.  If  c  were  obtained  for  the  aggregate  table  we 
should  use  this  result  for  the  select  tables. 

14.  A  numerical  example  with  the  0NM(5)  Table  may  be  of 
interest.  A  normal  curve  with  standard  duration  10  and 
origin  55J  was  taken,  and  the  terms  multiplied  by  colog^, 
These  were  then  grouped  in  fives,  and  the  first  two  moments 
calculated  about  age  55^.  One  little  point  should  be  borne  in 
mind  in  connection  with  the  grouping ;  though  the  centre  of 
the  base  on  which  the  product  qx  x  exposed  stands  is  x  +  h, 
the  result  colog^x  exposed  is  an  ordinate  at  x;  the  centre 
point  of  five  ages  20  to  24  is  22J  when  qx  is  used  and  22  when 
oology  is  used. 

The  figures  were 

^+^=136387. 

1st  moment  about  55|  in  5-years  group  =  1*416184. 

2nd       „  „  „  =4-1929354 

Deducting   (W.   F.    Sheppard's   adjustment)  T^-  from   the 
second  moment  and  multiplying  the  first  moment    and   the 

h  2 
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adjusted  second  moment   by  5  and  25  respectively  to  make 
the  unit  one  year  instead  of  5  years,  we  have 

/,=     7-080920 
^2=164-384085 

then 


g  (*-*)= 

•9586889 

t-h  = 

9-092617 

log10c= 

•03948873 

A= 

•00301749 

B= 

•00004518782 

]ogi0B  = 

5*6550214 

qx  was  then  calculated  from  the  graduated  oology  obtained 
from  the  values  of  A,  B  and  c,  and  the  following  table  of 
expected  deaths  was  worked  out.  The  values  of  qx  are  given 
in  the  table  showing  the  frequency-curve  graduation  : — 


Graduated 

Deviation. 

Age 
Group. 

for  Central 

E  xpected 
Deaths. 

Deaths. 

Age  of  Group. 

+                        - 

Under  25 

13-0 

9 

4-0 

25- 

•00812 

67-0 

69 

2-0 

30- 

•00882 

211-6 

205 

6'6 

35- 

•00991 

380-8 

369 

11-8 

40- 

•01162 

566-9 

588 

211 

45- 

01431 

799-7 

801 

1-3 

50- 

01854 

1057-5 

1,064 

6-5 

55- 

•02517 

1392-7 

1,399 

6-3 

60- 

•03551 

1790-2 

1,752 

38-2 

65- 

•05160 

21530 

2,164 

11-0 

70- 

•07639 

22493 

2,216 

33-3 

75- 

11415 

1888-7 

1,965 

76-3 

80- 

•17053 

1213-6 

1,237 

23-4 

85- 

•23352 

5191 

494 

25-1 

90- 

•36484 

136-6 

129 

7-6 

95- 

20-6 

19 

1-6 

... 

14460-3 

14,480 

128-2 

147-9 

276-1 

This  result,  which  would  have  been  improved  by  using  all 
the  terms  instead  of  grouping,  is  very  like  that  given  by 
Mr.  G-.  F.  Hardy,  but  avoids  having  to  obtain  c  by  trial. 
Mr.  Hardy's  expected  and  actual  deaths  balance  better  than 
the  above,  but  I  do  not  think  the  rates  have  been  understated 
systematically,  as  the  75-79  group  accounts  for  the 
disagreement.     The  total  deviation  is  less  than  Mr.  Hardy's. 
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15.  When  dealing  with,  the  adjustments  for  moments  it  was 
remarked  that  it  is  best  to  use  unadjusted  moments  except 
when  there  is  high  contact  at  each  end  of  the  curve.  In  some 
cases  the  unadjusted  moments  are  rather  far  from  the  truth, 
and  consequently  the  curve  obtained  is  by  no  means  the  best 
that  can  be  found.  This  most  frequently  happens  when  the 
curve  rises  very  abruptly,  as  in  our  example  for  Type  I.,  or 
when  it  takes  the  form  of  the  example  for  Type  III.,  the  reason 
being  that,  in  such  cases,  the  assumption  that  an  area  is 
concentrated  at  the  middle  of  a  base  of  unit  length  involves 
a  considerable  error.  In  the  Type  I.  example,  for  instance, 
the  first  group  was  assumed  to  be  an  ordinate  at  17,  whereas 
the  curve  starts  at  age  16*76,  and  the  central  point  ought, 
therefore,  to  be  later.  The  results  can  sometimes  be  improved 
considerably  by  basing  a  second  graduation  on  the  first  and, 
in  the  particular  case  just  referred  to,  since  the  first  group  is 
too  large,  we  might  assume  that  the  curve  should  start  at  17*5. 
It  is  unnecessary  to  find  as  many  as  four  moments,  for  by 
assuming  that  the  start  of  the  curve  and  the  range  (b)  are 
known,  the  equations  on  p.  59,  giving  the  moments  about  the 
start  of  the  curve,  afford  a  very  simple  solution.     We  have 


6(wi  +  l)    and     ,  =  fca(w1  +  l)(m1  +  2) 

m1  +  m2-\-2      L    ^'2~  (m1  +  m2+2)(ml  +  m2+3) 


and  writing  71=  —.  l  and  7.,=  — r4  , 

b  fi  ib 

we  have  ,  ,       71(72  —  !) 

71—7-2 

(72-l)(l-7i) 


and  m2  + 1 


71  —  72 


where  jl  is  written  for  a  moment  about  the  start  of  the  curve. 
16.  If  the  range  of  a  curve  can  be  fixed  by  general 
considerations,  a  good  deal  of  labour  can  thus  be  saved,  while, 
if  the  start  of  the  curve  is  known,  the  following  solution 
depending  on  three  moments  is  of  use. 

Writing  \.,=  -A  and  X3=  -r~A- 

P   I  P  2/A   1 
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the  values  of  the  constants  in  the  equation  to  the  curve  are 
given  by 


in 


i  +  1=     2(\2-\,) 

—  K,-x  —  A/o  —  A.oA-3 


2  fa-AJ  (\3-l)(l-X2) 
Wl*  +        (2^-X2-\2\3)  (1  +  X3-  2X2) 

7         ,  mi  +  mo  +  2 

and  «i :  «2  =  ??i1 :  m2 

17.  Returning  to  the  example  of  Type  I.,  and  considering  the 
line  for  age  22  in  the  table  on  p.  54,  we  see  that  4*175  and 
14'  634  give  S2  and  S3,  excluding  the  first  group,  and  the 
moments  about  age  17  are  then  found  to  be  4*175  and  29*268; 
transferring  to  17*5,  we  have  4*075  and  24*268  ;  adding  the 
moments  for  the  first  group,  *034  x  I  and  *034  x  (-i-)2  respectively, 
yu,/1  =  4*0818  and  yu/2  —  24*26936.  Assuming  a  range  of  15*5,  and 
using  the  formulas  given  above, 

m1=       -3498 
W2=     2*7758 

a,=      1*735 

a2=    13*765 

7/0  =  154*2 

and  the  mode  is  17*5  +  1*735  x  5  =  26"175. 

From  these  values  the  graduated  figures  for  the  first  four 

groups  are  37,  140,  152,  143,  which  is  an  improvement  on  the 

figures     obtained     previously.      This     example    is    used    for 

convenience,  but  with  regard  to  adjustments  in  the  particular 

case  the  remarks  in  the  footnote  on  p.  55  should  be  borne  in 

mind. 

18.  As    a    second    case,   the    example  for  Type  III.  may  be 

considered.     Assuming  that  the  value  of  p(— "0783584)  is  not 

to  be  altered,  then  the  first  moment  about  the  start  of  the 

,  aQ,     p+1      '9216416        .  .       ,,  . 

curve  (see  p.  08)  =  - =  -  — .      Assuming  the  curve  to 

7  7 
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start  at   *8   the  first  moment  about  the    point   is   calculated 
as  follows  : — 

44  x    '1*=    4-4 

135  x    -7   =94-5 

45x17   =76-5 

12x2-7   =32-4 

8x3-7   =29-6 

3x4-7   =14-1 

1  x  5*7   =   5*7 

3x6-7   =20-1 


251  277-3 

277-3 

The  first  moment  is  therefore      ..  „  .    — 1*0875.  and  hence 

2ol 

ry— -84752.     The  value  of  y0  is  205-0,  and  the  graduation  is 

47,   123,  48,    19,   8,   3,   2,   1.     The   equation  to   the   curve   is 

i/  =  205-0ctf~'07186e~'84751-r   with    the    origin    at    the    start    of    the 

curve,  so  that  in  this  case  the  y0  was  calculated  by  the  formula 

which    the    reader    will    have    no    difficulty   in 


y°-r(P+i, 

reproducing  for  himself.  When  moments  are  calculated  about 
the  start  of  the  curve  and  the  form  taken  is  that  of  the  present 
curve,  it  is  convenient  to  use  the  equation  in  this  form  rather 
than  in  that  given  on  p.  65.  It  may  be  mentioned  that  as  the 
value  of  p  is  nearly  zero  in  the  particular  example,  a  good 
result  would  be  obtained  by  assuming  that  value,  or,  which 
is  the  same  thing,  by  putting  y^y^e'V;  y  now  becomes 
l-0875-J=  -91958,  and  y0  =  251  x  '91958  =  230*8,  and  the 
graduation  is  43,  125,  50,  20,  8,  3,  1,  1. 

19.  It  sometimes  happens  that  the  error  involved  in  the 
calculations  of  the  moment  tends  to  balance  that  resulting 
from  the  curve  not  starting  at  the  beginning  of  the  unit  base 
assumed  for  the  first  group.  An  instance  of  this  is  the  first 
example  of  Table  I.  for  which  the  mean  is  at  duration  5'182, 
and  the  moments  and  constants  are — 

^=     17-63688  A=     3-34846 

^=   1355361  ft=     6-18392 

^4=1923-565  «=-l-307 

*  Strictly  should  be  rather  earlier,  because  the  ordinates  are  decreasing  very 

rapidly. 
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so  that  the  curve  will  be  of  Type  I.  and  equation  to  it  is 

7/  =  -89O82ty-'(i29fi85(25-49729-ci01'624275 

where  the  origin  is  at  1*02897  where  the  curve  starts. 

The  graduation  by  this  curve  is  shown  in  the  following 
table — 


Duration. 

Withdrawals. 

Graduated  by 
Type  I.  curve. 

1 

308 

312 

2 

200 

198 

3 

118 

101 

1 

69 

76 

5 

59 

58 

6 

u 

15 

7 

29 

37 

8 

28 

30 

9 

26 

25 

10 

2L 

21 

11 

18 

18 

12 

18 

15 

13 

12 

13 

14 

11 

11 

15 

5 

9 

16 

11 

7 

17 

7 

6 

IS 

6 

5 

19 

1 

1 

20 

3 

3 

21 

1 

2 

22 

3 

2 

23 

2 

1 

21 

1 

1,000 

1,000 

20.  The  calculation  of  the  graduated  area  of  the  first  group 
may  present  a  difficulty }  as  a  quadrature  formula  cannot  be 
applied,  and  the  following  method  gives  the  best  way  of 
obtaining  a  correct  value— 


y0xl"i(b—x)l"-dx=\    y0x"l4b"l-  —  m2b"t-~1 


-1*  + 


m2{»i2—l) 
~2T 


■--x2—.  .  .jdx 


J  V"'i  +  1       /<mi  +  2)  J 


which  is  a  rapidly  convergent  series  when  x  is  small. 


In  the 

last  example  where  a?  is  1*5— 1*02897  =  '47103  the  second  term 
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barely  affects  the  result.     y0  must,  of  course,  be  calculated  by 
the  formula 

N  Y(r)     

^+m3+l'r(m1+i)r(w2+i) 

which  is  an  analogous  form  to  that  given  for  similar  Type  III. 
curves  in  Art.  18. 

21.  The  expression  for  finding  the  area   of  the  first  group  in 
Type  III.  curves  is 

j;,0e-v^r  =  ;/,^l(-|-1-^2+...) 
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PART    11/ 

CHAPTER   VI. 

COEEELATION. 

1.  Two  measurable  characteristics,  A  and  B,  are  said  to  be 
correlated,  when,  with  different  values,  a?  of  A,  Ave  do  not  find 
the  same  value,  y  of  B,  equally  likely  to  be  associated.  In 
other  words,  certain  values  of  B  are  relatively  more  likely  to 
occur  with  the  value  x  than  others. 

2.  In  practice,  as  one  characteristic  increases,  the  other 
generally  either  steadily  increases  or  decreases,  and  it  is 
exceptional  to  find  that  while  one  increases  steadily  the 
other  increases  for  a  time  and  then  decreases.  Put  in 
a  rough-and-ready  way,  the  definition  can  in  particular  cases, 
with  which  actuaries  are  familiar,  be  stated  :  "  The  mean 
ages  at  maturity  in  Endowment  Assurances  increase  with  the 
unexpired  term  when  the  policies  are  grouped  according  to 
the  unexpired  term " ;  or,  "  the  older  a  bachelor,  the  less 
likely  is  he  to  marry  and  have  children."  There  is  correlation 
between  ages  at  maturity  and  unexpired  term,  and  between 
the  age  of  a  bachelor  and  the  number  of  children,  and  it  is 
required  to  find  a  method  of  measuring  the  amount  of 
correlation  statistically.  The  easiest  way  to  appreciate  the 
nature  of  the  problem  is  with  the  help  of  a  table  of  double 
entry,  such  as  the  following,  which  gives  particulars  of  2,870 
endowment  assurances  grouped  according  to  their  unexpired 
term.  A  little  examination  of  the  table  shows  that  there  is  a 
connection  between  the  two  functions,  but  does  not  give 
any  measure   of  the  correlation   suitable  for  comparison  with 
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the  experiences  of  other  offices  or  with  that  of  the  same  office 
at  a  later  date  : — 


Unexpired 

Central  Age  at 

Maturity. 

Mean 

term  of 
Endowment 

Total. 

Maturity 
Age 

Assurances. 

30 

35    40    45 

50      55 

60 

65 

70    75 

for  the  row. 

0-4 

2 

2 

26        6 

14 

6 



56 

53-75 

24 

20      16      12 

8 

4 

0 

4 

5-9 

1 

1      2      6 

62 

36 

40 

22 

2 

172 

55-03 

IS 

15      12        9 

(5 

3 

0 

3 

6 

10-14 

2      9    17 

117 

99 

127 

52 

8      1 

432 

55-85 

10        S        0 

4 

2 

0 

2 

4        6 

15-19 

3 

...      6   24 

145 

155 

237 

84 

11     ... 

665 

56-59 

6 

:.      4      3 

2 

l 

0 

l 

2        :: 

20-24 

1    ...      3 

133 

167 

271 

78 

20      1 

674 

57-58 

0 

0        0       0 

0 

0 

0 

0 

0       0 

25-29 

9 

90 

123 

231 

71 

11      3 

538 

57-88 

3 

2 

1 

0 

i 

2        3 

30-34 

...       1 

11 

49 

127 

49 

8      2 

247 

59-94 

6 

4 

2 

0 

2 

4        (5 

35-39 

6 

3 

49 
0 

22 

3 



77 

61-04 

40-44 

2 

4 

2 

0 

3 

4 

...        1 

S      12 

8 

62-50 

45-49 

6 

... 

1 

0 

1 
5 

1 

65-00 

Total 

4 

17 

62 

584 

643 

1,098 

388 

60     8 

2,870 

Note. — For  explanation  of  small  numbers,  see  Art.  16. 
The  above  table  is  called  a  correlation  or  frequency  table, 
and  a  column  or  row  in  it  is  called  an  array.  The  middle 
value  of  the  variable  with  which  the  row  is  associated  is 
called  its  type,  so  that  the  third  column  (i.e.,  that  headed  40) 
would  be  called  the  ?/-array  of  type  40,  and  the  fourth  row 
would  be  called  the  ^-array  of  type  17*5,  because  17*5  is  the 
middle  of  the  15-J9  group. 

3.  Now,  returning  to  our  definition  of  correlation,  and 
examining  the  last  column  of  the  table,  which  gives  the  mean 
age  in  each  row,  we  see  that  the  numbers  in  it  tend  to 
increase  as  we  go  down  the  column,  and  the  age  at  maturity 
is  therefore  correlated  with  the  unexpired  term.  The  figure 
on  p.  108  shows  the  series  of  mean  values  clearly. 

4.  A  little  consideration  will  lead  to  further  informatioD,  for 
if  we  imagine  a  case  in  which  there  is  no  correlation,  the 
series  of  the  means  of  the  rows  will  be  independent  of  the 
other  function;  that  is,  they  will  run  horizontally  when 
plotted  out  as  a  diagram.  Another  point  to  be  noted  is  that 
there  are  two  kinds  of  correlation  (positive  and  negative), 
for  the  functions  may  increase  together,  or  one  may  increase 
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and  the  other  decrease;  in  the  ease  of  the  endowment 
assurances  the  correlation  is  positive,  because  as  the  term 
increases  the  mean  maturity  age  also  increases. 
5.  These  introductory  remarks  will  give  an  indication  of  the 
nature  of  the  problem  to  be  solved,  and  may  help  to  render  the 
following  proof  easier  to  follow.  It  should  be  remembered  that 
the  proof  deals  with  a  function  of  n  variables.*  The  table  given 
above  has  only  two  variables,  but  it  is  easy  to  see  how  more 
variables  may  be  introduced  in  similar  tables ;  for  instance, 
endowment  assurances  by  limited  payments  give  three 
variables  (term  of  premium,  term  of  assurance  and  age  of 
life),  while  an  increase  in  the  number  of  lives — say,  joint- 
life  endowment  assurances  by  limited  payments — gives  four 
variables.  We  may  now  consider  what  equation  will  represent 
the  numbers  iu  the  body  of  the  correlation  table  and  how  this 
equation  can  be  utilized  to  express  the  relationship  between 
the  two  functions  with  which  the  correlation  table  is  concerned. 
[6.]  Let  771,  772,  773  .  .  .  rjn  be  deviations  from  their  respective 
means  of  a  complex  of  measurable  characteristics.  The  sizes 
of  the  functions  measured,  or  organs,  are  determined  by  a 
large  number  of  independent  contributory  causes.  Let  there 
be  m  of  these  causes,  and  let  their  deviations  from  their  means 
be  €1,  eL>,  63  •  .  .  elu,  then  rjx,  rj2,  ij3  .  •  .  %%  will  be  functions  of 
6],  6.2,  63  .  .  .  em.  Further,  if  m>n  certain  of  the  e's  will 
appear  only  in  certain  of  the  77's,  and  the  e's  will  not  be  fully 
determined  for  a  given  tj  complex.  We  also  assume  that  the 
variations  in  intensity  of  the  contributory  causes  are  small 
as  compared  with  their  absolute  intensity,  and  that  these 
variations  follow  the  normal  law  of  distribution ;  that  is,  we 
assume  that  the  deviations  from  the  mean  value  can  be 
graduated  by  the  normal  curve  of  error  (Type  YIL).  The 
mean  complex  being  reached  with  the  mean  intensities  of 
contributory  causes,  we  have,  by  the  principle  of  the  super 
position  of  .small  quantities, 

?;1  =  a1,e1  +  a1.2e2  +  a13€3H-  .  .  .  +almem\ 

V2  =  02iei  +  022^2  +02363+    •    •    •    +«2»»€,nl 


Vn  =  OwiCi  +  0»2€2  +  0*363  +  •  •  •  +  <*>wm£t 


(i-> 


•  A  student  reading  the  subject  for  the  first  time  would  do  well  to  omit  the 
paragraphs  indicated  by  brackets.  After  the  statistical  idea  underlying  correlation 
has  been  understood,  it  will  be  found  easier  to  follow  the  theoretical  work. 
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The  as  are  coefficients  whose  values  have  to  be  determined, 
and  any  of  the  system  of  a's  may  be  zero,  for  a  particular 
contributory  cause  may  have  no  effect  on  a  particular  result. 
Further,  the  chance  that  we  have  a  conjunction  of  contributory 
causes  lying  between  ei  and  ei  +  Sei,  e2  and  e2  +  Se2  •  •  •  and 
between  em  and  em  f  8em  will  be  given  by 

(612  6"2     4-  J-     *™f_   \ 

2^  +  £?  +  •  • '  +  um*)  x  SeM  .  .  .  Sem      .     .     .    (ii.) 

where  the  standard  deviations  of  the  distributions  are 
kxk2  .  .  .  Km  and  C  is  constant.* 

Now,  by  (i.)  let  n  of  the  variables  e,  say  the  first  n,  be 
replaced  by  the  variables  77,  then  the  probability  that  we  have 
a  complex  with  organs  lying  between  77  x  and  971  +  8771, 
7)2  and  %+S?i2  ■  •  •  Vn  and  rjn  +  8r}n,  together  with  a  series  of 
contributory  causes  lying  between  ew+1  and  eu+1  +  Se„+u 
en+2  and  Sen+2  .  .  .  eOT  and  em +  $€,»,  will  be 

where  C  is  a  constant,  a  function  of  C  and  the  a/s,  and  cj>2 
consists  of  the  following  parts  : — 

(i.)   A  quadratic  function  of  the  t/s  from  rjx  to  rjn. 
(ii.)  A  quadratic  function  of  the  e's  from  en+i  to  em. 
(iii.)  A  series  of  functions  of  the  type 

e»+i(&i,  n  +  lVl+h,  n  +  )V-2+    •   •  •    +bn.n  +  iVn) 
en+zfii,  n+2Vi  +  b2}  n+iV-2+  •  •  •    +  &n>  n  +  iVn) 


€m(bi3  mVi  +  h,  mV-2+-  •  ■    +bn.mVn) 

where  some  of  the  b's  may  be  zero. 

Now,  if  P'  be  integrated  for  all  values  from  —  go  to  +  go  of 
all  the  contributory  causes  en+i3  en+2  .  .  .  em,  we  shall  have  the 
whole  chance  of  a  complex  with  organs  falling  between  7]x  and 
V\  +  &Vi>  V2  and  772  +  8772  .  .  .  7jn  and  r)n  +  $7)n.  But  every  time 
we  integrate  with  regard  to  an  e,  say  en+x,  we  alter  the 
constants  of  each  contributory  part  of  <j>2,  but  do  not  alter  the 

*  Consider  any  particular  case  of  the  normal  curves,  the  chance  of  getting  a 
result  between  ei  and  ej  +  8^  when  the  distribution  is  Type  VII.  is  y^e-^l^^b^ 
where  *i  is  the  standard  deviation ;  similarly  with  each  of  the  other  causes.  As 
the  causes  are  independent,  the  product  of  the  various  chances  gives  the  required 
chance, 
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triple  constitution  of  </>2  except  to  cause  one  e  to  disappear 
from  its  (ii.)  and  (iii.)  constituents.  At  the  same  time  we 
alter  C  without  introducing  into  it  any  terms  in  77.  Thus, 
finally,  after  m  —  n  integrations,  </>2  is  reduced  to  its  first 
constituent,  or  we  conclude  that  the  chance  of  a  complex  of 
organs  between  7)x  and  ^ +  £?/!,  77.,,  and  7)2-\-Sr}2  .  .  .  rjn  and 
Vn  +  &Vn  occurring  is  given  by 

P  =  Ce-?x'8r)1,8r)2,  Sr]3,  .  .  .  Srjn (iii.) 

where  X2  is  a  quadratic  function  of  the  77's.  This  is  the  law  of 
frequency  for  the  complex. 

Consider  the  expression  (iii.)  but  replace  X2  by  a  quadratic 
function,  then 

P—  Qe-hi''1ir]l-+C.,..,r,.."+  .  .  .  +2e1.,rllT1.,+2r1.,r1l-n.,+  .  .  .  } 

Here  C,  cpp,  cpq  are  constants,  and  S]  denotes  a  summation  for 
every  value  of  p,  and  S2  for  every  pair  of  value  of  p  and  q  in 
the  series. 

Taking  the  simplest  case,  when  there  are  two  variables, 
this  becomes 

P  —  Ce-K'i'h2  +C.r,.S  +  2c1,rllr1,)   * 

Integrate  P  for  all  values  of  771  from   —  x  to  +  x ,  and  we 
must  have  the  normal  curve  of  t].2  variation. 

f,— 2=C2     1 

1<T\  V  C\C2 

Similarly  integrating  for  all  values  of  77-2, 
-  =c/l-  — 

2<x22  \  C\C.2 

Integrating  for  all  values  of  77!  and  77-2  to  obtain  the  total 
frequency,  we  have — 

Ctt 


N  = 


VC1C2  — (c12)2 


*  In  Appendix  III.,  some  integrals  connected  with  the  normal  curve  of  error 

are  dealt  with.      The   result    .- — h=eJl —  )  can   be    reached  at  once  by 

za-jz        "\        e1c2/ 

rearranging    the    index     in     the     expression     for     P    as    a    jaerfect     square 
— £-e2(l L'2    )v-2~   as  is  done  in  No.  (v.)  of  Appendix  III. 
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Now  put  r=—  —  and  write  x  and  y  for  rji  and  770,  anc^ 

C\C-2 

we  have — 

/  ^      2  W(l-r*)      o-lff2(l-r2)  +  <r2*(L-r*)/.  .  .  (iv.) 


vT 


:7rcr1o-o' 


The  equation  just  given  is  a  graduation  formula  capable  of 
representing  tables  like  that  on  p.  107.  Tt  has  been  obtained 
on  certain  assumptions  which  may  not  all  be  realised  in 
practice,  but  it  has  a  far  larger  scope  in  practical  work  than 
the  analogous  normal  curve  of  error  has  in  frequency-curve 
operations. 

[7.]  Since  e  20V2  represents  one  series  of  variations  and  e  2<r22 
the  other,  it  follows  that  if  there  were  no  correlation  at  all  the 
frequency  of  any  particular  result,  x ,  y  would  be  the  product 

of  the   two  chances,  i.e.,  proportional    to   e    \2^2     -oW,  which 

means  that  the  size  of  the  term jzr ,  or  r,  is  a  measure 

<Txcr,{l  —  r2) 

of  the  correlation,     r  is  called  the  coefficient  of  correlation. 
[8.]  Perhaps    the    easiest  way  tosee  how   its    value    can    be 
obtained  from   an    actual    experience    is    by  looking-    at    the 
matter  from  the  curve-fitting  point  of  view,  and  dealing  with 
the  expression  for  z  hj  moments. 

It  will  be  remembered  that  the  moments  were  obtained  by 
summing  for  all  values  of  the  frequencies  multiplied  by  the 
powers  of  the  independent  variable,  but  as  we  now  have  two 
variables  we  can  take  n  powers  of  the  one  and  m  of  the  other. 
Thus,  if  we  take  the  second  powers  of  the  x  distances  and 
the  zero  power  of  the  y  distances  [i.e.,  neglect  y),  we  obtain 
the  ordinary  second  moment  of  the  frequencies  reckoned  only 
in  the  x  direction.  Similarly,  with  the  second  powers  of  the 
y  distances  and  the  zero  power  of  the  x  distances.  These 
calculations  give  two  of  the  unknown  constants  for  a  =  \//j,2 
(see  Type  VII.).  There  is,  however,  another  second-order 
term  to  be  considered,  namely,  that  obtained  by  taking  the 
first  powers  of  both  the  x  distances  and  the  y  distances,  i.e., 
multiplying  the  frequencies  by  xy.     This  may  be  written  : — 


c.  i 


xy  dx  dy,  where  z  has  the  value  given  above. 


x    J  —as 
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This  double  integral  reduces  to  Nro-jo-.,  (see  Appendix  III.) 
or    the     (xy)   moment  of   the  total   distribution ="Nra-1<r<ij    or 
_  (xy)  moment 
No-i<r2 
To     calculate     the     coefficient     of     correlation    we    have, 
therefore,   to   find  the   x2  moment,  the    y2  moment    and   the 
xy  moment  about  the  centroid  vertical. 

[9.]  Now  we  have  seen  that  equation  (iv.)  gives  an  expression 
for  describing  a  correlation  table  such  as  the  table  of 
endowment  assurances  on  p.  107,  and  from  that  equation  it  is 
clear  that  the  distribution  of  an  array  of  any  type  t  is 

z  =  ZQe-{<l^-2ht>:  +  g2t^ 

where  gi}  h  and  g.>  are  written  for  the  longer  expressions 
in  cti,  a.2,  and  r.  If  Ave  make  the  index  into  a  perfect  square, 
we  have 


■z0e 


=  z0e 


This  last  expression  is  a  normal  distribution  having  the  same 
standard  deviation  as  that  of  the  whole  surface  ;  but  its  mean 

differs  from  that  of  the  whole  surface  bv  — :  and  it  follows 

"    9i 
that 

(1)  The  deviation  of  the  mean  of  the  array  is  directly 

proportional  to  the  type  ;  or  the  means  of  the 
arrays  increase  or  decrease  in  arithmetical 
progression  or  lie  on  a  straight  line  (called  the 
regression  line). 

(2)  The  standard  deviations  of  all  parallel  arrays  are 

equal  and  independent  of  their  types. 

10.  Before  returning  to  the  statistical  example  it  will  be  well 
to  consider  the  following  proof,*  which  proceeds  on  the 
principle  that  we  require  to  fit  a  straight  line  (y  =  a2-\-b2x)  to 
the  correlation  table. 

*  This  proof  is  a  modification  of  one  given  by  Mr.  G.  U.  Yule,  in  the 
J.S.S.,  vol.  k.,  pp.  812,  &c,  andProc.  Boy.  Soc,  1897,  vol.  lx.,  pp.  477,  &c.  It  has 
been  altered  to  avoid  the  introduction  of  the  method  of  least  squares. 

I 
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**  * 


^  * 


*l    X2 


*3 


Let  xx  yi,  #2y2,  &c.,  be  associated  deviations,  and  let  y=.a^-\-h^x 
be  the  straight  line  used  in  the  graduation;  then  the  graduated 
figure  corresponding  to  xx  is  a2Jrh2Xi. 

Now,  if  we  proceed  as  we  did  in  fitting  frequency-curves  by  the 
method  of  moments,  we  make  the  graduated  and  ungraduated  areas, 
means,  &c,  equal,  or 

(ff2+&2#i)  +  («2+&2#2)+  •  •  •  =yi+y2+  •  •  • ; 

or  N«2+*2S'(#)=S'(y). 

And  («2+&2#l)#I+(«2+&2#2)#2+   •   ■   •  =^^1  +  ^2^2+   •   •   •  5 

or  tfoS'O)  +  &2S'(ar2)  =S'(ay), 

where  S'(#)  gives  the  first  moment  of  the  x's,  S'(y)  the  first  moment 
of  the  y's,  S'(#2)  the  second  moment  for  the  x's,  and  S'(#y)  a 
moment  in  which  any  frequency  is  multiplied  by  the  product  of  the 
distances  in  the  x  and  y  directions. 

If  these  moments  are  now  transferred  to  the  mean,  as  was  done 
in  fitting  the  frequency -curves,  we  have 


Nrt2=0, 

or 

tf2=0; 

and 

£2SV)  =  S'(*y), 

or 

-7~  W(x*)  ' 
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But  we  have  already  seen  that  the  second  moment  of  the  whole 
frequency  (N)  is  Ncrr  ; 

h  _  s'(^) 

-      No?  ; 

S'O/) 

If  we  now  write  S(.r*/)  rrNcr^oJ',  we  have 

0-2 

where  r  will  represent  the  statistical  measure  of  correlation  (coefficient 
of  correlation)  between  the  ar's  and  y's. 

11.  At  first  sight  it  may  appear  that  the  two  equations  just 
given,  showing  the  relationship  between  a:  and  y,  are  not 
consistent.     It  must,  however,  be  remembered  that  the  first, 

y=:r—x,    gives    the    mean    values    of    y    corresponding   to 

particular  values  of  x,  while  the  second  gives  the  mean  values 
of  x  corresponding  to  particular  values  of  y.  To  take  a  simple 
case  as  an  example,  assume  that  crl  =  a.1=l  and  that  r  —  '\, 
then  if  x=0  the  mean  of  the  y's  corresponding  to  this  value 
of  x  is  0,  and  if  #  =  20  the  mean  of  the  y's  will  be  2.  When 
we  turn  the  matter  round,  however,  we  cannot,  of  course, 
assert  that  the  mean  of  the  x's  corresponding  to  y  =  2  is  20  ; 
it  will  be  -2. 

12.  After  this  preliminary  remark  we  may  return  to  the 
two  equations  and  consider  how  it  is  that  r  is  a  measure  of 
correlation  and  whether  it  can  always  be  treated  as  a 
satisfactory  measure.     We  can  best  see  that  r  is  a  measure  of 

correlation  by   rewriting  the  equation    y=r—x  in  the  form 

u  .        x 

*     —r       or  Y  =  X?',  and   wTe   can  then  interpret  it  as  giving 

(To  0"i 

one  characteristic  in  terms  of  the  other  where  the  mean  is  the 
origin  (this  is  due  to  referring  moments  to  the  mean  in  the 
proof)  and  the  unit  of  measurement  is  the  standard  deviation 
in  each  case.  In  this  form  we  see  at  once  that  as  one 
characteristic  (X)  increases  the  mean  (Y)  of  the  corresponding 

i  2 
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series  of  the  other  characteristic  increases  to  an  extent 
which  depends  on  the  value  of  r,  while  if  r  is  negative 
Y  decreases.  It  is  only  if  r  is  unity  that  the  increments 
of  X  and  Y  become  equal  and  absolute  correlation  is 
reached.  If  Y  remains  constant  as  the  value  of  X  increases 
the  definition  at  the  beginning  of  this  chapter  tells  us  that 
there  is  no  correlation,  and  r  in  this  case  is  zero  as  can  easily 
be  seen  from  the  equation  Y  =  X?\  The  value  of  r  lies  between 
—  1  and  +  1  (see  Chap.  X.,  Art.  4),  and  its  sign  has  no  influence 
on  its  numerical  value.  In  other  words  a  large  negative  value 
does  not  mean  that  the  two  characteristics  do  not  vary 
together  but  only  that  increases  in  the  one  correspond  with 
decreases  in  the  other  ;  the  numerical  value  of  r  indicates  the 
extent  to  which  variations  in  the  two  characteristics  correspond. 
This  indication  is  satisfactory  provided  the  means,  when 
plotted  in  a  diagram  such  as  that  on  p.  108,  fall  approximately 
in  a  straight  line  (i.e.,  "  regression  "*  is  linear).  Distinct 
deviations  from  linearity  are  not  so  common  as  might  be 
supposed,  but  if  they  are  very  marked  in  any  case,  r  ceases 
to  be  an  entirely  satisfactory  measure  of  the  correlation. 

13.  We    may    take    this     opportunity    of   removing    another 

difficulty    that    is    sometimes    met.       Some    students    have    a 

doubt  which  is  best  shown  by  the  question,   "  How  can  there 

be  perfect  correlation   when  one  thing  is  always  smaller  than 

another "  ?       As    an   example  we   may   take  the  correlation 

between  the  lengths  of  a  man's  right  arm  and  his  left  arm  ; 

here  the  coefficient  of  correlation  would  be  practically  unity, 

and  since  each  characteristic  is  measured  from  it  own  mean, 

and  in  terms   of  its   own  standard  deviation,  the    coefficient 

would  not  be  decreased  if  every  left  arm  was  a  certain  number 

of  inches  shorter  than  the  right  or  if  it  bore  a  fixed  relation 

99 
in  length,  say  --. .    ,  to  the  right  arm. 

14.  When  dealing  with  moments  on  p.  17,  we  noticed  that 
though  we  required  to  find  them  about  the  mean,  it'  was  best 
in  practice  to  take  them  about  some  point  fixed,  arbitrarily  so 
as  to  avoid  fractions  and  then  adjust  the  results  afterwards. 
The  values  of  the  <tx  and  <72  can,  of  course,  be  found  with 
the    help   of    the    formula  on    p.   19,  viz.,    v.2  =  v'o  —  d2.  .    The 

*  The  term  "  regression"  was  invented  by  Mr.  Francis  Galton  in  connection 
with  the  study  of  heredity  ;  it  indicates  the  way  the  children  of  particular 
parents  tend  to  "  step  back  "  to  the  ordinary  population  mean. 
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deduction  of  TV  from  the  second  moment  should  be  made 
for  the  same  reason  and  in  the  same  cases  as  in  frequency- 
curve  fitting. 

With  regard  to  the  product  moment  we  have — 

S(*Y)=S(*+4)(y  +  dB 

=  S(xy)+d1S(y)  +  d2S(x)+~Hdld2; 

or  since  S  (x)  =  S  (y)  =  0 

S(ay)=S(*Y)-N«4 

where   S(V«/')  is  calculated   about  a  point  distant  dx  from  the 
mean  of  the  x's  and  d2  from  the  mean  of  the  y's. 
15.  The    statistical   example    on  p.   107  can    now  be  worked 
through.     It  will  be  found  to  make  the  proofs  and  methods 
given  above  much  easier  to  grasp. 

A  point  about  which  moments  are  to  be  calculated  is  first 
fixed,  say,  the  middle  of  the  group  corresponding  to  maturity 
age  60  and  unexpired  terms  20-24  years,  and  for  the  present 
the  calculations  are  made  abont  this  point.  The  following 
table  shows  the  calculation  of  the  mean  and  second  moment 
of  the  totals  of  the  y-arrays,  i.e.,  the  totals  at  the  bottom  of 
the  table,  because  columns  are  y-arrays  and  rows  ^-arrays : — 


Frequency. 

x' 

Frequency  x  x' 

Frequency  x  (jc')~ 

6 

-6 

36 

216 

A 

-5 

20 

100 

17 

-4 

68 

272 

62 

-3 

186 

558     * 

584 

_2 

1,168 

2,336 

co  c  a 

CO    CO    4- 

00    GC    CO 

-1 

0 

1 

643 

643 
388 

-2,121 

388 

60 

2 

120 

240 

8 

3 

24 

72 

2,870 =N 

+     532 

J, 825 

-1,589 

*  =  -z 


1589 


2870 


—  •553659 
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Hence,  the  mean   age   =60  —  2*76830  =  57*23170,  because 
the  unit  of  grouping  is  5  years. 

A  nor 

o-ta=  no**  -~^i2-~T2  (Sheppard's  adjustment) 

=  1-37465- -083 
=  1-29132 
cj!  =1-13637 
Treating  the  rows  in  the  same  way,  the  following  table 
was  formed : — 


Frequency. 

y' 

Frequency  x  y' 

Frequency  x  y'- 

56 

-4 

224 

896 

172 

-3 

516 

1,548 

432 

_o 

864 

1,728 

665 
674 
538 

-1 
0 
1 

665 

665 

538 

-2,269 

538 

247 

2 

494 

988 

77 

3 

231 

693 

8 

4 

32 

128 

1 

5 

5 

25 

2,870  =  X 

+  1,3U0 

7,209 

-    969 

4= 


969 

2870 


^  =  -•337631 


Mean  unexpired  term  =22-1-68815  =  20-31185 
.     7209 


02 


CZ.,2  _  JL 

2870       "      l  s 


and 


=  2-31453 
a,  =  1-52135 


16.  The  value  of  S(a?y)  is  formed  with  the  help  of  the  numbers 
in  very  small  type  appearing  under  the  frequencies  in  the 
correlation  table.  The  frequency  62  in  the  50  column,  for 
instance,  is  distanced  three  spaces  upwards  and  two  sideways 
from  the  arbitrary  origin,  so  the  value  of  x'y'  by  which  it  has 
to  be  multiplied  is  3  x  2  =  6,  as  shown  in  the  small  type.  The 
other  figures  are  obtained  in  like  manner,  but  the  sign  must 
be  borne  in  mind.  Any  value  from  the  left-hand  upper 
division  of  the  table,  or  in  the  lower  right-hand  division,  will 
be  positive,  because   the  frequency  will   be   multiplied  by  a 


product  of  an  se  and  y  having   like    signs 


hil 


e  any 


value 
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from  the  other  divisions  will  be  negative,  because  the  x  and  y 

by  which  the  frequencies  are  multiplied  are  of  opposite  signs. 

The  calculation  of  the  product  moment  is  as  follows  : — 


Total  of 

Frequencies.                                     tfaf 

frequencies 
(/) 

/**y 

155  +  71-84-123       . 

1 

+  19 

+  19 

145  +99  +  11  +49  -11  -52-49  -90 

1          2 

102 

204 

24  +  36  +  3  +  22-22-6-9. 

I          3 

48 

144 

6  +  6  +  8  +  3-6-8-11-2  +  117 

4 

113 

452 

1 

5 

1 

5 

3  +  17  +  62  +  2-1-1-2      . 

6 

80 

480 

9  +  26 

8 

35 

280 

6 

•  1         9 

6 

54 

2 

10 

2 

20 

2+2+1       .... 

12 

5 

60 

1 

15 

1 

15 

1 

18 

1 

18 

2 

24 

2 

48 

1,799 

S(a?y)=S(a>y)-N<M2 
=  1799-Nc^„ 
=  1262-51 
S(ay)  _ 


1262-51 


No-,0-,      2870  x  113637  x  1-52135 
=  •25445. 

The  coefficient  of  correlation  between  age  at  maturity  and 
the  unexpired  term  of  endowment  assurances  is  '25445. 

The  equation  representing  the  one  function  in  terms  of  the 
other  is 

=  •19007// 

where  all  measurements  are  made  from  the  mean  and  the  unit 
is  5  years.     The  Hue  drawn  in  the  figure  gives  this  result. 
17.  An  alternative  method  similar  to  the  summation  method 
given  in  Art.  9,  Chap.  III.  for  moments  can  be    conveniently 
used  in  connection  with  correlation  tables. 

Taking  the  same  example,  we  obtain  from  the  given  table 
another  in  the  same  form,  giving  the  y  sum  of  it  by  summing 
each  column  continuously,  and  then  form  a  third  table  by 
summing  the  second  table  across  continuously. 
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Table  of  the  y-sum  of 

Correlation  Table. 

Unexpired 

term  or 
Endowment 
Assurances. 

Central  Age  at  Maturity. 

30 

35 

40 

45 

50 

55 

60 

65 

70 

75 

Totals. 

0-4 

6 

4 

17 

62 

584 

643 

1,098 

388 

60 

8 

2,870 

5-9 

4 

4 

17 

60 

558 

637 

1,084 

382 

60 

8 

2,814 

10-14 

3 

3 

15 

54 

496 

601 

1,044 

360 

58 

8 

2,642 

15-19 

3 

1 

6 

37 

379 

502 

917 

308 

50 

7 

2,210 

20-24 

0 

1 

0 

13 

234 

347 

680 

224 

39 

7 

1,545 

25-29 

0 

0 

0 

10 

101 

180 

409 

146 

19 

6 

871 

30-34 

0 

0 

0 

1 

11 

57 

178 

75 

8 

3 

333 

35-39 

0 

0 

0 

0 

0 

8 

51 

26 

0 

1 

86 

40-44 

0 

0 

0 

0 

0 

2 

2 

4 

0 

1 

9 

> 45-49 

0 
16 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

Totals 

13 

55 

237 

2,363 

2,977 

5,463 

1,914 

294 

49 

13,381 

The  totals  in  the  right-hand  column  of  the  second  table 
give  the  first  sum  of  the  total  in  the  right-hand  column  of  the 
correlation  table,  and  are  the  same  as  the  column  a?=30  in 
the  third  table.  The  total  of  the  y  sum,  or  of  the  first  column 
in  the  xy  table,  gives  the  mean  of  the  y's  (13,381h-2,870), 
and  similarly  the  sum  of  the  first  row  gives  the  mean  of  the 
x's  (18,501^-2,870). 


Table  of  x-sum  of  above  Table,  i.e.,  Table  giving  all  cases  for 
xy  group  and  over  in  Correlation  Table. 


-d       -  B 

Central  Age  at  Maturity. 

■5     B  (3 

30 

35 

40 

45         50 

55         60 

65 

70 

ID 

Totals. 

0-4 

2,870 

2,864 

2,860 

2,843 

2,781 

2,197'  1,554 

456 

68 

8 

18,501 

5-9 

2,814 

2,810 

2,806 

2,789 

2,729 

2,171  !  1.534 

450 

68 

8 

18.179 

10-14 

2,642 

2,639 

2,636 

2.621 

2.567 

2.071    1,470 

426 

66 

8 

17,146 

15-19 

2,210 

2,207 

2,206 

2.200 

2.163 

1,784   1,282 

365 

57 

7 

14,481 

20-24 

1,545 

1,545 

1,544 

1.544 

1,531 

1,297       950 

270 

16 

7 

10,279 

25-29 

871 

871 

871 

871 

861 

760      580 

171 

25 

6 

5,887 

30-34 

333 

333 

333 

333 

332 

321 

261 

86 

11 

3 

2,349 

35-39 

86 

86 

86 

86 

86 

86 

78 

27 

1 

1 

623 

!  40-44 

9 

9 

9 

9 

9 

9 

7 

5 

1 

1 

68 

45-19 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

8 

Totals 

13,381 

13,365 

13,352 

13,297 

13,060 

10,6U7 

7.720 

-.257 

343 

49 

87,521 

The  total  of  the  last  table  gives  the  xy  moment  (87,521), 
and  the  x  standard  deviation  is  found  by  forming  from  the 


121 

first  row  the  series  18501,  15631,  12767,  9907,  7064,  4283, 
2086,  532,  76,  8,  and  summing  it,  i.e.,  70,855.  The  second 
moment  about  the  mean  can  then  be  found,  the  numerical 
work  being  as  follows  : — 

18501      aAA*a 
x  —  mean  =  -00777  =  0-4463 

v2  =  2S3-d(l  +  d) 
=  ^0855-6.4463  x  7-4463 

=  1-3747 
Similarly  with  the  y  moments 


13381 

2870 

(13381  + 10511  +  7697  +  5055  +  2845  + 1300  +  429  +  96  +  10  +  1) 


y  mean=-OQ-n  =4*6624 
Zo/U 


4-6624x56624 


=2-2312 


The  xy  moment  =  ^—^-  -6*4463  x  4-6624 

=  •4399 

Remembering  that  v2—  -fa  (Sheppard's  adjustment)  =a2 
and  that  the  means  are  in  the  above  work  measured  from  the 
centre  of  the  group  a?  =  25,  y——  5  to  —  1  years,  the  values 
just  given  will  be  found  to  agree  with  those  previously 
obtained  by  the  direct  method.     The  xy   moment   (-4399)  is 

the  same  as    007^  )  i-e->  S(#y)-r-N. 

18.  Before  dealing  with  other  examples  and  methods,  it  may 
be  well  to  point  out  a  use  to  which  the  particular  example 
might  be  put.  The  result  in  the  equation  form  gives  the 
average  age  corresponding  to  each  unexpired  term.  Now, 
we  might  weight  each  entry  with  Mr.  Lidstone's  Z's,*  or  with 
the  temporary  annuities,  then  work  out  an  equation  in  each 
case,  and  get  new  series  of  average  ages.  The  results  used  in 
a  valuation  would  give  the  relative  accuracy  of  the  three 
methods.  I  have  worked  out  the  formula  with  the  Z  weights 
(HM  Table),  and  found  that 

Age  at  maturity  =57'595  +  *1200  x  (unexpired  term). 

*  The  method  used  by  me  was  approximate  and  can  probably  be  improved,  the 
result  is  merely  given  as  an  indication  of  a  possible  line  for  research. 
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The  results  could  also  be  used  as  a  rough  check  on  the 
average  ages  at  valuations,  and  there  certainly  seems  a 
possibility  of  doing  something  towards  making  a  simple 
"  model  office "  for  endowment  assurances  with  the  help  of 
the  method  we  have  been  using. 

19.  When  constructing  correlation  tables  a  little  care  is 
necessary,  because  in  certain  arrangements  of  statistical 
material  it  is  possible  to  obtain  a  significant  value  for  a 
coefficient  of  correlation  when  in  reality  the  two  functions  are 
absolutely  uncorrelated.  Such  a  result  is  called  "  spurious 
correlation,"  and  as  the  manner  in  which  it  arises  is  by  the 
use  of  indices,  it  is  defined  as  the  correlation  which  will  be 
found  between  indices,  when  the  absolute  values  of  the 
functions  dealt  with  have  been  selected  purely  at  random. 
As  an  example  of  the  way  spurious  correlation  might  be 
introduced  in  actuarial  statistics,  we  may  refer  to  endowment 
assurances  by  limited  payments  on  the  books  of  a  company 
doing  a  large  quantity  of  such  business,  and  consider  the 
term  of  the  original  assurance  (£1),  the  number  of  premiums 
to  be  paid  in  future  (t2),  and  the  number  of  years  for  which 
the   policy  has  been   in  force   (t3).      If  Ave  formed  the  ratios 

f  and  — ,  and  Avorked  out  the  coefficients  of  correlation,  Ave 

should  not  obtain  a  measure  of  the  correlation  betAveen 
number  of  premiums  payable  in  future  and  the  number  of 
years  in  force,  because  the  result  of  using  fractions  with  the 
same  denominator  in  each  would  be  to  exaggerate  correlation 
— that  is,  to  introduce  spurious  correlation. 

The  general  propositions  of  spurious  correlation,  of  Avhich 
the  result  just  mentioned  is  a  particular  case,  are  as 
folloAvs : — 

I.  To  find  the  mean  of  an  index  in  terms  of  the  means, 
standard  deviations  and  coefficients  of  correlation  of  the  two 
absolute  measurements. 

Let  #1,  cc->,  a'3,  #4,  be  the  absolute  sizes  of  any  four  correlated 
subjects,  mu  m>,  m3,  w4,  their  mean  values;  <ri,  o-.>,  o-3,  or4,  their 
standard  deviations;  r12,  vvj,  r3i,  m5  ''24,  ''is?  the  six  coefficient^  of 
correlation;  €j ,  e2,  €3,   e4,  the  deviations  of  the  four  subjects  from 

/p. 
their  means,  i.e.,  ^r1  =  w1  +  €{  &c. ;  i13  the  mean  value  of  the  index  —  and 

/2i  the  mean  value  of  — ;   ^  and  2o  the  standard  deviations  of  the 
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indices  —  and  -  "  respectively,  and  N  the  total  number  of  groups. 

We  shall  suppose  the  ratios  of  the  deviations  of  the  mean  values 
of  the  organs  are  so  small  that  their  cubes  may  be  neglected. 


Tben        ;,3      1S(-')      *    »J.SJf1+  «.y1+«»Y 


N    «ra3l  Wij  ?«3  w1w«3  wH     ) 

But  S(ci)=S(e3)=0  and  S(el€3)=Ncr1a-2r13  and  S(€3)2=N<r32 


»hf,     ,     0-3"  O"!       0-0 

&ia=  —  ( 1+  — -. rVi 

ms\         m3-       ;«!    m.2 

.        »«2/n    .    err        cr2     0-4         \ 

and  ?24=  —  (  H 1  —       ^  J 

mA\        mf      m.2   m4        J 


II.   To  find  the  standard  deviation  of  an  index  in  terms  of  the 

standard    deviations    and    coefficient  of   correlation    of   the    two 
absolute  measurements. 

^3"       [V            «lA            «8/  V            W3-         «'l      »*3            /J 


-f-  square  terms  - 

W23 


or 


W32     [mi 

=*£(  N  — -  +N  —  -2N- rl3) 

-13—  h.i    VV     0+        .»— ^ — 'l3f 


wr       w3 


III.  To  find  the  coefficient  of  correlation  of  two  indices  in  terms 
of  the  coefficients  of  correlation  of  four  absolute  measurements  and 
their  standard  deviations. 

Let  —  and  —  be  the  two  indices. 

.1*3  X\ 
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Then,  if  p  be  the  coefficient  of  correlation  of  the  two  indices, 


»<**•«■£ -*0G-*0 


_  milHi      /             €1    _   €3    __     €l€3             g32 

W3W4    V        mi       m3      inxm3      m32 

wz32      ;«! 

.^1 

W3 

A,    .    e2         e4           e2e4          €42 
V         ;;*2      7«4       m2mA       mA- 

M42         W*2 

*W4 

—  iuizSl—  —  —  )(— ) 

\mx      m3J  \v1.2      mj 

as  we  neglect  the  terms  of  cubic  order. 


,  o-]     o-2  0-1     <r4  o-2    0-3  cr2    0-4 

0^13^24=  *13*24 r™ Tn r23_1 r24 

\w*i    m2  mx    m4  m2    m3  in2   m4     > 


Hence, 


<Ti       (To  <J\       0"4  Co       CT3  (To       (T4 

r  r12 ^14 ;vH r24 

mi    Wo  mx    m 4  m2    m3  m2   m4 

P- 


/  f  <ri2   .    0-32  en     <r2       )      /fo-2-        0-44  cr2    cr4       ) 

\  Krnf      m32         mx    m2      )   V  w«22      mf        m2   m4      > 

Proposition  I.  shows  that  the  mean  of  an  index  is  not  the  ratio  of 
the  means  of  the  corresponding  absolute  measurements,  and 
Proposition  III.  shows  that  the  p  will  vanish  when  the  four  subjects 
forming  the  indices  are   quite  uncorrected,  while,  if  two,  say,  the 

third  and  fourth,  are  identical,  so  that  r34=l  and  —  =  — ,  we  have 

m3       m4 


<TX       (T2                   (TX       O-3                   <X2       O-3                   0"32 
*  >*12 — * r13 ~     ^23H „ 

mx    m2  nil    »H  ^H    m3  m3 


P  = 


<Tl2    .      C32  <n      O-3  /  f  OV  0-3-  (To      a3  \ 

\  \  — -  +  — -  —  2 rx3  U/v-H 2  — 2 r23\ 

\  Kinx2      m3-        Mi    m3      )   V  ym22      m32        m2   m3        > 


This  would  become  applicable  in  the  endowment  assurances  by  limited 
payments  to  which  we  referred. 

An  interesting  special  case  arises  when  the  subjects  xu  .r2,  x3  are 

not  correlated  and  —  and  —  are  formed,  then 
cc3  x3 

0V2 


Ifctf       O32)     //^.^] 
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CHAPTER  VII. 

Correlation  of  Characters  not  Quantitatively 
Measurable. 

1.  Before  the  theory  in  this  section  is  dealt  with,  we  will  give  a 
table  showing  the  class  of  problem  with  which  it  deals,  drawn 
from  vaccination  statistics.  This  subject  was  brought  before 
the  Institute  of  Actuaries  recently  by  the  late  A.  F.  Burridge, 
but  his  figures  are  not  in  a  form  convenient  for  the  present 
purpose,  and  the  table  is  taken  from  a  paper  on  the  subject 
by  Dr.  W.  R.  Macdonell*  and  relates  to  the  Sheffield 
smallpox  outbreak  of  1887-1888:— 


S  2 

93     - 
<B    rz 

A 

Strrxcth  to  resist  Smallpox  when  incurred. 

Cicatrix. 

Recoveries. 

Deaths. 

Total. 

Present  

Absent   

3,951 

278 

200 
274 

4,151 
552 

Total 

4,229 

474 

4,703 

The  functions  between  which  we  want  to  find  the 
correlation  are  "  Strength  to  resist  smallpox  when  incurred  " 
and  "  Degree  of  effective  vaccination,"  and  the  statistics  we 
have  cannot  be  arranged  in  a  more  detailed  manner  than  the 
above.  The  characters  cannot  be  measured  quantitatively; 
but  as  the  absence  of  such  measurement  does  not  mean  that 
there  is  no  correlation,  we  must  see  how  the  coefficient  can 
be  obtained  in  such  a  case. 


*  BiometriJca,  vol.  i.,  pp.  375,  et  seq.  This  paper  and  a  supplementary  one 
deal  with,  the  subject  in  a  way  that  shows  clearly  the  strength  of  the  evidence  on 
the  side  of  vaccination.  The  question  of  class  is  investigated,  a  practical  point 
frecptently  neglected. 
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—  x 


Table  of  Frequencie 


a 

b 

a+b 

c 

d 

c  +  d 

a  +  c 

b+d 

N 

2.  Using  the  same  notation  as  that  of  the  previous  chapter, 
imagine  the  frequency  surface 

N"  - l    1    ( x~      ffa     2«-.'/  \ 

2J=  —  e      21— r2V<Ti2      <r2-      <rx<r.) 

2tt\/  1  — r'2cricr2 
to  be  divided  into  four  parts  by  two  planes  at  right  angles  to 
the  axes  of  x  and  y  at  distances  h'  and  h'  from  the  origin  ;  as 
suggested  by  the  figures  above. 
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Then 


27rv  1  —  r2<Tia-2j  i>'  J  fc* 

xt  p  °°  p  °°      11 

= /__  e  "  ^^+r-2rxy)dxdy 


by  substituting 

and  writing 
Further, 


OtP1  V2 

v2  for   —  and  ?/2  for  -^ 
??  =       and  &  =  — . 


N 


00        1  x- 


V   27TCT!  J  fc' 

n  p  --**  7 

=  -7=     6-     2X  r/^ 

\/27rJfc 

N    f50  -1  - 
and  ^  +  r/,=    -^-=     e     2""dt/ 

and,  remembering  that  N  the  total   f requeney  =  a  +  b  +  c  + «?, 
we  have  — 

N-2(6  +  ^)  =  N-N/V/2  fV^& 

(a  +  c)-{b  +  d)_     /2  f*     ,  ZJ 
.XT  ~\/  e        "^ 

X>  >  7T  J  o 

and,  similarly, 

Since  a,  b,  c,  and  d  are  known,  ^   and  k  can   be  found  from 
Sheppard's  Tables,  and  the  problem  becomes 
"  To  find  a  value  for  r  from  the  equation 

XT  poo    poo  I      1 

2Wl-r2)h  }k  y 

where  d,  N,  /i,  and  k  are  known." 
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The  solution  given  by  Professor  Pearson  (see  Appendix  III.) 
leads  to  the  following  equation. 

^^=r+^AA+J(tf-l)(^-l)+g^-8)i(*i-8) 

+  ^  \»  -  6/*2  +  3)  (*4  -  6*2  +  3) 

+  ^A(#-10A2+15)A<*4-10*2+15) 

+  Sa«  (^6-15/i4  +  45/i2-15)(^-15A-4 
o040 

-J-45&2— 15)  +  etc. 

where  H  =  -.-^e-*7'2,  and  K  =      ; — e~^' . 

v/2tt  y^tt 

The  numerical  solution  has  to  be  obtained  by  approximating  to 
the  roots,  and  Newton's  method*  is  convenient  for  the  purpose. 
3.  The  numerical  work  of  our  example  is  as  follows  : — 


4  f  * 


Jl 

e 


_  (a  +  c)-(b  +  d)  _  3755 
N  "  4703 


=  •7984265 
7^=1-27716 

by  interpolation  in  Sheppard's  Tables.  In  using  these  tables 
for  this  purpose  remember  that  the  value  "7984265  corresponds 
to  a  in  his  notation,  so  h  of  (1 +  '7984265)  =  '8992132  must  be 
looked  up  inversely  in  his  Table  I.  If  his  Table  III.  be  used 
it  must  be  entered  with  '7984265. 
Similarly, 

J2  \ke-Wdy= -7652561 

A-  =  l-18833 

We  next  require    v-,Tjl?  '>  an^  we   ^rs*   £>et  ^rom   Shepparcl's 

Tables 

H  =  -1764870     .-.  log  H  =  1-2467127 

K  =  -1969111     .-.  log  K  =1-2942702 

*  Neirton's  method  of  approximating  to  the  root  of  an  equation.  Let  f(x)  =  0 
be  an  equation  from  which  the  value  of  x  is  to  be  found  and  let  &  be  a  value 
near  to  x  so  that  x  =  b+h  where  h  is  small,  then  f(x)  =  f(b  +  K)=f(b)  +  hf(b)  + 
terms  involving  higher  powers  of  h  by  Taylor's  Theorem,  and  since  f(x)  =  0,  we 

have  7i  =  —  "^7777  oraj=b—  ",,,, 4 .      The  chief  objection   to  the  method  is   that 

/  (f>)  f  (b)  J 

there  may  be  more  than  one  root  near  the  value  b,  but  this  does  not  hold  in  the 
application  to  correlation.  (Cf.  Approximations  to  rate  of  interest  from  an 
annuity,  Text-Book,  Part  I.,  p.  110,  formula  8.) 
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Hence  log ^^  ='1258266 

and  Cw^  =  1-336062 

N2HK 

Dr.  Macdonell  gives  56  instead  of  62  as  the  last  two 
figures •  the  difference  is  probably  due  to  interpolation. 

Turning  to  the  expression  for  r,  we  notice  that  hh  is  a 
product  in  the  coefficients  of  r2,  r4,  r6,  &c,  so  it  is  well  to  work 
out  its  value  and  keep  a  note  of  it  while  the  coefficients  are 
being  found.  It  is  also  advisable  to  begin  the  work  by 
writing  down  the  first  six  or  seven  powers  of  In  and  h. 

Dr.  Macdonell  gives  the  following  series  : — 

•097083r"+-008170r6  +  '1196]4?-5  +  *137450r4 

+  "043352r3  +  *758844r2+r=  1*336056 

In  order  to  obtain  r  we  must  find  a  value  near  the  true 
one  as  a  first  approximation. 

Taking  -758844r2  +  r- 1*336056  =  0 

-1  +a/{1  +4  x  1-336  x  -7588} 
we  have  r  =  i.5177 

=  •79 

Now,  this  value  will  be  in  excess  of  the  truth,  owing  to 
our  only  using  two  terms  of  the  series  on  the  left-hand  side 
of  the  equation  for  finding  r,  and  Ave  may  take  #77  as  a  trial 
rate.     Applying  Newton's  Rule,  we  have  : — 

-1-336056+ (-77) +-7588C77)2  +  -0434(-77)3 
+  -1375(-77)4  +  -1196(-77)5  +  -0082(-77)6  +  -0971(-77)- 


=  •77- 


=  •77- 


1  +  2(*77)  (-7588)  +  3(-77)2('0434)  +4(*77)3(*1375) 

+  5(-77)4(-1196)  +  6(-77)5  (-0082)  +  7('77)6(-0971) 
•0022 
2rMl 
=  •7692 

In  work  such  as  this,  a  table  giving  the  first  seven  powers 
of  the  natural  numbers  is  a  help.  There  is  one  in  the  first 
edition  of  Barlow's  Tables,  but  this  edition  is  now  difficult  to 
get,  though  a  copy  of  it  will  be  found  in  the  Library  of  the 
Institute.  A  table  of  the  same  powers  has  been  given  in 
Biometrika,  vol.  ii.,  pp.  474,  et  seq. 

K 


130 


4.  The  value  we  have    found  for  r  can  be  checked  in  the 
following  way  : — 


AW-  had 
d  = 


N 


e    21- 


-(xa+y2—2rxy) 


dxdy 


2tt\/1'— >:/< 

"VT  r°° 


27r\/r 

1ST 


r2Jk  [}*  )    * 


2tt\/1— r-j 


*./ 


|  |  ,.-.„, .rPVl-rt/x'^ 


-r 


f  r  1 


"'i\te-^dX^y 


where        t  =     .     u     . 
A/l-r2 

To    approximate    to    the    double    integral    we    can    find 

e~^dX 


J  t 


for   a  few   equidistant   values   of   t   and   apply   a   quadrature 
formula.     The  following  table   shows  the  work : — 


Values  of 

y 

Values  of 
h-yr    _f 
Vi_r-2 

1     ,* 

1 

^/27^ 

Product  of 
two  previous 

cols. 

Application 

of 

Simpson's  Rule. 

(&)1-18833 
(£+£)l-68833 
2-18833 
2-68833 
3-18833 
3-68833 

•5682 

-  -0342 

-  -6367 
-1-2392 
-1-8417 
-2-4441 

•2849 
•5133 
•7370 
•8923 
•9672 
■9795 

•1968 
•0980 
•0363 
•0107 
•0025 
•0004 

•0561 
•0491 
•0265 
•0096 
•0024 
•0004 

Xl  =  -0561 
x4  =  -1964 
x  2  =  -0530 
X  4  = -0384 
X  2  = -0048 
X  4  =  -0016 

•3503 -i-6 

=  •584 

The  final  column  shows  the  application  of  Simpson's  first 
quadrature  rule,  viz.  : — 

y^»=g{yo+4y*+2yi+4y,|+  . . 

and  gives  '584  as  the  value  of  the  double  integral;  multiplying 
this  by  N(4703),  we  have,  274*6  as  the  value  of  the  group 
called  d,  which  agrees  with  the  figure  given  in  the  correlation 
table. 
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CHAPTER    VIII. 

Probable  Errors. 

1.  In  the  previous  chapters  we  have  assumed  that  the  means, 
standard  deviations,  moments,  constants,  and  coefficients  of 
correlation  obtained  from  a  body  of  statistics  give  an  exact 
measure  of  the  constants  or  of  the  correlation  between  two 
functions.  This  is  not  really  the  case.  If  it  were  possible  to 
make  an  infinite  number  of  trials  bearing  on  a  given  subject, 
we  could  obtain  constants  or  measure  correlation  accurately, 
but  in  practice  it  is  only  possible  to  take  a  sample  from  this 
total  "  population."  The  variation  that  results  from  using  a 
random  sample,  instead  of  the  whole  "population,"  to  find 
the  value  of  any  particular  constant,  could  be  reasonably 
measured  by  the  standard  deviation  of  that  constant,  for,  as 
we  have  already  remarked,  the  standard  deviation  measures 
the  way  statistics  are  collected  round  their  mean  or  their 
"  scatter  "  from  it.  Custom  has,  however,  led  to  the  use  of 
another  function  known  as  the  Probable  Error,  which  is 
•67449  times  the  Standard  Deviation.  The  connection 
between  these  two  functions  is  due  to  the  theory  having  been 
developed  from  the  normal  curve  of  error,  and  arises  in  the 
following  way.  The  probable  error  gives  that  value  of  x 
(say  p)  which  divides  the  part  of  the  normal  curve 
representing  positive  errors  into  two   equal  portions ;    it    is 

p     1 
therefore  given  by       — y=  e~*v2dx  =  '2b,  where  the  whole  area 

J  0  V27T 

of  the  curve  (positive  plus  negative  deviations)  is  unity.     In 
order  to  find  p  in  terms  of  the  standard  deviation,  we  have, 

k  2 


132 

therefore,    to    obtain    the    value     of     x,    corresponding    to 

r   i 

-1(1  + a)  ='75  in  Sheppard's  Tables,  where  a  is  , e~^'°'da\ 

Jo\/27r 

This  can  be  done  by  interpolating  inversely  by  Lagrange's 

formula,  and  p   is   thus   found  to  be  *67449   approximately. 

Probably  the  best  way  of  viewing  the   use   of  the  probable 

error   is   to    regard    it    as    a    conventional    reduction    of   the 

standard  deviation. 

2.  The  general  rule  followed  by  statisticians  when  considering 
probable  errors  is  that  unless  a  result  exceeds  the  expected 
by  two  or  three  times  the  probable  error,  it  is  not  safe  to 
assume  that  the  particular  case  differs  from  the  expected 
result. 

3.  We  will  first  consider  the  most  simple  case,  and  find  the 
probable  error  of  an  event  happening  mp  times  when  m  trials 
are  made  and  p  is  the  probability  of  the  event  happening, 
and  q  of  its  failing.  The  whole  series  is  given  by 
(ft  +  q_)m  =pm  +  inp)m~lq +  .  .  .  +qm.  Taking  moments  about 
the  centre  of  the  group  represented  by  pm  the  first  moment 
is  mpm~lq  +  m(m—  l)pm~2q  +  .  .  .  +  mqm  =  mq(p-\-q)m~l  =  mq. 

The  second  moment  about  the  same  point  is 

mpm-1q  +  2m(m  —  l)pm~2q2 + %m(m—  l)(??i  —  2)pm~zq^  +  .  .  .  +m2qm 

t         -.n        o  o      m(m  —  1)  (m—  2)         a  a 
=  mpm-lq  +  m{in  —  l)pm-2q2+  — ^ ^»-y+  .  .  .  +mqm 

+  m(m  —  Y)pjm-2q2  +  m{m  —  l){m  —  2)pm-zq*  +  .  .  .  +m(m—l)q1 

=  mq-\-m(m—l)q2 

The  second  moment  about  the  mean  is,  therefore, 
mq  +  m  (m  —  1 )  q2 — m2q2  =  mpq 
The  standard  deviation  =  v/uL2=\/mpq 

and  the  probable  error  is  '67449vm^^. 

4.  This  value  is  of  considerable  use  in  statistical  work,  and  it 
will,  therefore,  be  advisable  to  see  its  application  to  a  few 
examples. 

It  has   been  remarked  that  the  number  of  male  children 
born  is  to  the  number  of  female  children  born  as  1,050  :  1,000 ; 
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in    other   words,    the    probability  of    a  child   being   male   is 

9'fi>».     If  51,350  out  of  100,000  children  proved  to  be  males 

in  a  certain  community,  would  it  be  safe  to  base  any  theory 
connected  with  the  variation  from  the  usual  probability  on 
the    statistics  ?       The    expected   result    is    51,220,    and   the 

probable  error  is  "67449  Jl00,000 ^£?  .  J^  =  +  103*9.  The 
1  \        }        20o0   20o0      — 

difference  between  the  actual  case  and  the  expected  result  was 

130,  and  as  this  is  only  one  and  a  quarter  times  the  probable 

error,  no  definite  conclusion  can  be  based  on  the  divergence 

from  the  result. 

If   the   number   of   cases  had  been  10,000,000,   and   the 

actual  number  5,135,000,  then  the  probable  error  being  1,039, 

and  the  actual  difference  13,000,  it  would  have  been  sufficient 

evidence  for  the  conclusion  that  the  ratio  1,050 : 1,000  did  not 

fit  the  particular  case. 

5.  If  the  probability  of  death  within  a  year  is  #007,  the 
probable  error  in  200  cases  is  -67449\/200  x  '007  x^993  =  '80, 
and  it  would,  therefore,  be  possible  to  approximate  to  a 
loading  for  emergencies  if  2*2  was  taken  instead  of  1*4  as  the 
number  of  deaths  expected  in  a  year  out  of  200  cases  on  risk 
for  a  year.  That  is,  it  would  not  be  unreasonable  to  treat 
'0110  as  the  rate  of  mortality  instead  of  '007,  in  order  to 
obtain  some  idea  of  an  emergency  loading  for  term  assurances 
on  the  assumption  that  the  number  of  cases  is  about  200  and 
the  average  age  is  such  that  '007  might  be  taken  as  the 
probability  of  death  in  a  year.  It  has  also  been  assumed  that 
it  is  correct  to  treat  each  class  as  if  it  were  subject  to  its  own 
rate  of  mortality,  and  had  to  be  treated  independently  of  the 
rest  of  the  business ;  this  is,  however,  a  debatable  point. 

6.  It  will  be  noticed  that  if  m  remains  constant,  then  \/mjpq 
has  its  largest  numerical  value  when  'p  —  q  =  i,  which  shows 
that  an  office  will  generally  find  that  if  it  has  two  classes  of 
equal  size,  and  one  is  subject  to  a  higher  rate  of  mortality 
than  the  other,  the  former  will  have  the  larger  actual 
deviations  from  the  expected  number  of  claims,  because  the 
probability  of  dying  in  a  year  only  reaches  the  value  i  at  the 
end  of  the  mortality  table. 

7.  If  we  now  consider  a  frequency  distribution  instead  of  an 
individual  experiment,  it  is  clear  that  if  ys  is  the  theoretical 
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frequency  in  the  sth  group  that  would  occur  in  N  cases,  the 
probability  of  a  particular  one   of  the  N   cases  falling  in  the 

stli  group  is  H  =p,  and  the  probability  of  its  falling  elsewhere 

is  l—~=q.     Then  in  m  trials  the  distribution  of  frequency 

of  this  group    will   be    given    by    {jp  +  q)m    and   its    standard 
deviation 


^■=^/TO^=Vmm(1-y 


m 


where    y's=^ys,    and    is    accordingly    the   proportion    of    y8 

which  we  should  expect  in  the  typical  group  of  m  out  of 
"N  individuals.  In  actual  practice  we  have,  however,  only  the 
sample,  but  since  the  sample  is  only  likely  to  deviate  from 
the  theoretical  value  to  a  proportionally  small  extent,  we  can 
replace  the  theoretical  value  by  the  observed  frequency, 
and  write 

•W1-^ •  (i-} 

where  ys  is  now  taken  as  the  frequency  of  the  sth  group  in 
the  sample. 

8.  If  one  frequency  in  any  distribution  is  too  large,  then  the 
others  must  on  the  average  be  too  small,  and  this  shows  that 
the  errors  between  groups  are  correlated.  The  next  point  to 
be  investigated  is  the  amount  of  the  correlation  between 
deviation  ys  and  ys>,  or  between  deviations  in  the  frequencies  of 
the  sth  and  sth  groups. 

Let  8^= deviation  from  ys  the   most  probable  value  in  the  sth 
group,  then,  since 

yi+y2+y3+  •  •  •  +>/*+  ■  ■  ■  ys>+  ■  •  •  -\-yn-m 

Now,  if  the  sample  has  given  too  large  a  value  in  the  sth  group,  it  is 
proper  to  suppose  that  this  error  will  be  distributed  among  the  other 
groups  in  the  proportion  of  their  relative  frequencies.  This  assumes 
that  deviations  are  only  due  to  random  sampling,  not  to  defective 
measurement  or  classification ;  if  the  deviations  were  due  to  these 
latter  causes,  it  might  be  reasonable  to  assume  that  the  excess  was 
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drawn    from    adjacent   groups,   but    it   is   necessary  to    confine    our 
attention  to  errors  due  to  random  sampling,  and  we  then  have 

%fr=-8y.x  -^~ 

m—ys 
and  Sys,Sys=-^      *" 


-  °y  (i-) 


m     1  —ys\m 

This  gives  the  effect  of  the  error  in  the  sth  group  on  that  in  the 
s'th  group  on  the  assumption  that  the  error  in  the  sth  group  is  the 
cause  of  all  deviations ;  to  give  effect  to  the  fact  that  all  groups 
contribute  we  must  sum  the  expression  for  all  samplings,  and  obtain 

m  l—ys/m  m 

Hence  ^y^y^ym--yj^ (n0 

The  only  step  likely  to  cause  difficulty  in  the  above  is  in  summing 
fysfys,  but  if  it  is  borne  in  mind  that  the  (ccy)  -moment  of  a 
correlation  table  gives  ro-i<r2,  the  difficulty  should  immediately 
disappear. 

9.  To  find  the  standard  deviation  <rh  of  the  mean  h  of  a  system  of 
observations. 

Measured  from  a  fixed  point  we  have 

S(y)  m 

where  y  is  the  frequency  of  size  ac. 

mSh  =  S(.vSy) 
Squaring  each  side,  we  have 

m%8h)'2=S(xs*8ys*)+2S'(xsa;s,Sy88ys,) 
where  S'  is  the  sum  for  all  values  of  s  and  s\  for  which  5  is  not 
equal  to  s'. 

By  dividing  by  the  number  of  samplings  after  summing  for  all 
such  samples,  this  gives 

m-cr/r  =  S  OV2oW2)  —  2  S'  (xs00s>o-y8(ry8-rym) 

or,  using  (i.)  and  (ii.), 

nAjtf  =  S O/y)  -  sf.r,2 V—)  - 2&'(s*J&  ) 

"  V        my  V  m  / 

S(xsys)       &(xs,ys,) 

—  mix  2—  m  —  —  x  

m  m 

=.m([X2 — h2) 
where  m/x'2  is  the  second  moment  of  the  whole  distribution  about  the 
fixed   point.      But   //2— A2=cr2=square    of    standard    deviation    of 
sample.     Hence 

a-jl=a-l^/m (iii.) 
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10.  Tins  last  result  is  of  considerable  use  in  statistical  work. 
A  large  number  of  cases  is  recorded  and  the  mean  used  to 
compare  the  particular  experiment  with  another  of  a  like  kind. 
Is  an  actual  difference  between  the  means  due  to  some  cause 
other  than  random  sampling  ?  A  practical  application  would 
be  the  comparison  of  the  average  profit  from  various  classes 
of  business  for  a  number  of  years.  The  standard  deviation 
of  the  profits  in  the  various  years  would  be  obtained  by 
taking  the  square  root  of  the  second  moment  about  the  mean 
and  dividing  it  b}T  the  square  root  of  the  number  of  years  ; 
the  quotient  would  give  cru  of  (iii) .  It  is  only  by  using  the 
standard  deviations  or  probable  errors  deduced  from  them 
that  it  would  be  possible  to  sav  definitely  whether  a  lower 
average  profit  in  a  certain  part  of  the  business  was  due  to 
chance  or  to  some  cause  requiring  removal. 

11.  In  a  similar  way  it  is  possible  to  find  the  probable  errors 
of  the  moments  and  constants,  but  this  leads  to  the  more 
theoretical  parts  of  the  subject,  with  which  it  is  unadvisable 
to  deal  in  a  book  of  this  character.  It  is,  however,  necessary 
to  call  attention  to  the  probable  error  of  the  coefficient  of 
correlation  owing  to  the  importance  of  that  function  in 
statistical  work. 

Its  value  has  been  shown*  to  be  "67449- — j=^-  •      We  have 

V  n 

already  found  in  Chapter  VI.  that  the  coefficient  of  correlation 
between  the  age  at  maturity  and  the  unexpired  term  of 
endowment  assurances  is  -25445,  but  it  is  not  right  to  assert 
definitely,  on  the  strength  of  this  information,  that  this 
coefficient  represents  any  real  relationship,  until  we  have  seen 
how  large  a  deviation  in  the  value  of  such  a  coefficient  might 
arise  purely  from  our  having  taken  a  random  sample.  The 
probable  error  of  r  is  found  by  inserting  2870  for  n  in 
the  formula  given  above,  and  we  have  +'00118  as  the 
probable  error.  It  is  customary  to  show  this  by  writing 
?•=:•  25445  +  '00118.  In  this  case,  therefore,  the  probable 
error  is  so  small  that  the  result  is  reliable,  but  it  sometimes 
happens,    especially    when    only    a    few    cases    have    been 


*  The  proof  is  given  in  Vlul.  Trans.  A.,  vol.  exci,  pp.  231-241.  It  is, 
however,  of  a  complicated  nature  and  unsuitable  for  insertion  in  the  present  work. 
This  last  remark  also  applies  to  the  proof  for  the  probable  error  when  the 
fourfold  table  is  used. 
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considered,  that  the  probable  error  is  large  enough  to 
make  it  impossible  to  base  any  definite  conclusion  on  the 
result  produced.  If,  for  instance,  it  had  been  found  that 
r  =  "0827  +  '0621,  it  would  have  been  impossible  to  say 
definitely  that  the  correlation  had  not  arisen  merely  from 
chance. 

12.  In  using  the  fourfold  table  the  probable  errors  are  larger, 
as  would  be  expected,  because  the  grouping  is  rougher,  and 
the  formula  by  which  they  should  strictly  be  calculated 
becomes  complicated.  The  formula  referred  to,  gives  as  the 
probable  error  of  r, 


•G7449f(fl  +  <Q(C  +  ft)  (a  +  c)(d+b)  (a  +  b)(d  +  c) 

xV'n\      '    4ft2  +^2  ri2  ^  ft2 


ad— be  ab —  cd  ac — bd]  i 

n2  n2  n2     J 


where  v=  _=      e-(fe2+fc3-2rfcfc)/2(i-r») 

2»Vl— r« 


1       f  h=?* 

i—  -=     Ji-r>e-&adx 
^2ic]  o 

1  l-rh 

_  Jl-r*  e-1fc*  fa 

v2ttJo 


fa 


and  it  is  assumed  that  the  fourfold  table  is  so  arranged  that 
a  +  c>b-\-d  and  a  +  b>c  +  d}  where  a,  b,  c,  and  d}  have  the 
meanings  indicated  on  p.  126.  The  numerical  work  for  finding 
the  probable  error  of  r  for  the  example  in  Chapter  VII.  is 
as  follows — 

1    r  h~rk  i     r  -5(3821 

fa  =  -t=\  Vl-'"  e-^dx—  -  7_  e-&dx=  -21505 

v  2irJ  0  v  JJttJ  0 

by   Sheppard's  Tables. 

1        f    —^  1        f -32230 

fa=     ,        ^-r»e-&*da;=  -=  e-i*3<fo= -12639 

v=  ; =  e-(fea+fc3+2UT)/2(l-r»)=: e--S6744—  10462 

x     2^1-^  2ttx -63900 
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.-.    log1  =  1-98039,  log ^  =  1-33254,  and  logifr2=  1-10171 ; 

A 


.*.  the  probable  error  of  r  is- 


•G744Q        f  ^  i 

^__l.j  -02283  +  -00145  +  '00479  +  00252  -  -00408- -01015  1*  =  ± -0124. 
•10462  ^4703  [  J 


13.  It  is  often  sufficient  to  assume  that  the  probable  error  by 
the    above  formula  will  be  three  times  that  by  the  formula 

1— r2 
•67449  — — -  •      The  latter  gives   +  -0040  in  the  above  case, 

\J  n 
which  is  a  good  approximation  to  J  of  -0124. 
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CHAPTER  IX. 

The  Test  of  Goodness  op  Fit. 

1.  When  the  values  of  ordinates  and  areas  were  calculated  in 
the  examples  of  the  various  types  of  frequency-curves,  no 
systematic  attempt  was  made  to  test  the  graduations,  in  order 
to  ascertain  whether  the  results  obtained  were  reasonable. 
Actuaries  have  generally  been  in  the  habit  of  imposing  on  the 
graduated  values  of  any  table  on  which  they  may  have  been 
working  rough  checks  which  have  amounted  to  a  comparison 
of  the  totals  in  various  groups  and  an  inspection  of  the 
changes  of  sign  in  the  differences  between  the  graduated  and 
ungraduated  figures.  The  problem  of  the  goodness  of  fit 
needs,  however,  more  accurate  treatment,  for  inspection,  even 
when  aided  by  the  calculation  of  a  mean  error  for  each 
group,"*  can  only  tell  that  certain  differences  are  large ;  and 
if  the  mean  error  be  exceeded  in  two  or  three  cases,  it  is 
impossible  to  say  whether  the  excesses  are  in  any  way 
balanced  by  equalities  in  the  rest  of  the  graduation.  A  test 
is  required  which  will  give  some  measure  of  the  disagreement 
as  judged  by  the  whole  graduation. 

2.  Now,  if  there  be  N  observations  distributed  in  n  +  1 
groups,  the  numbers  in  the  group  being  m\,  in2  •  •  ■  w'w+i, 
we  have  to  find  a  criterion  to  enable  us  to  decide  when  the 
series  m1}  m.2  .  .  .  mn+1  will  be  a  legitimate  graduation.  We 
may  clearly  take  a  legitimate  graduation  to  be  one  in  which 
the  observed  values  [in')  do  not  differ  from  the  theoretical 
(m)  by  more  than  the  deviations  that  would  be  expected  in 
random   sampling.      What   we    require    to    know   is    not    the 

*  Generally  calculated  as  i^ripq,  which  gives  approximately  the  average 
magnitude  of  the  deviations  irrespective  of  sign  from  the  mean  result. 
G.  F.  Hardy,  Journal  of  the  Institute  of  Actuaries,  xxvii.,  pp.  l!14,  et  seq. 
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probability  that  the  particular  series  of  m"s  will  occur  if  the 
m's  represent  the  theory,  but  the  probability  that  the  ??z/;s,  or 
an  equally  likely  or  less  likely  series,  will  arise.  To  appreciate 
the  difficulties  of  the  problem  we  may  consider  the  simplest 
case,  that  of  a  coin-tossing  experiment,  and  suppose  that  a 
coin  has  been  tossed  six  times  and  come  down  4  heads  and 
2  tails.  The  "graduation"  we  make  is  3  heads  and  3  tails, 
and  to  test  it  we  require  to  find  the  probability  of  obtaining  a 
result  as  unlikely,  or  more  unlikely,  than  the  observed  one. 
This  probability  is  the  same  as  that  of  getting  any  one  of  the 
following  results  : — 

6  heads  and  0  tails 

5       >>         »     1     >> 
4  2 

2  4 

0      „         »     6     „ 

It  is  obviously  impossible  to  calculate  such  probabilities 
directly,  even  when  the  simple  probabilities  leading  to  the 
deviations  are  known,  in  any  but  the  easiest  cases ;  but  when 
we  do  not  know  the  simple  probabilities,  or  the  case  is  a 
complicated  one,  a  further  difficulty  is  introduced,  owing  to 
our  inability  to  tell  from  a  priori  reasoning  which  of  the 
possible  cases  are  more  or  less  likely  than  that  which  lias 
actually  arisen.  It  would,  for  instance,  be  impossible  to  say, 
without  a  large  amount  of  arithmetical  work,  when  20  dice 
were  being  thrown,  whether  the  probability  of  getting  ten 
11  sixes"  or  more  was  greater  than  that  of  getting  two  " sixes" 
or  fewer ;  but  this  is  an  extremely  simple  case  compared  with 
the  general  proposition  in  which  deviations  over  a  series  of 
numbers  have  to  be  considered. 

3.  If  it  is  assumed  in  any  measurement  on  one  subject  that 
the  deviations  from  the  mean  take  the  form  of  the  normal 
curve  of  error  (Type  VII. ),  and  it  is  required  to  estimate  the 
chance  of  obtaining  deviations  greater  than  a  certain  value 
(£,  say),  it  will  be  necessary  to  sum  all  values  of  the  normal 
curve  beyond  t  on  each  side  of  the  mean,  i.e.,  take 

j        e-*9flfe+  I     e~*'2dx  =  2  j     e~x*dx 
and  divide  the  result  by  the  area  of  the  whole  curve,  i.e.,  by 
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the  total  deviations.  Assuming  that  there  are  two  measure- 
ments instead  of  one  (the  exposed  to  risk,  for  instance,  afc  two 
ages),  the  deviations  are,  as  it  were,  in  two  directions  instead 
of  one  ;  and  it  is  necessary  to  take  an  expression  with  two 
variables  instead  of  one.  The  expression  analogous  to  the 
normal  curve  is  the  correlation  surface 

z=z0e  -  lcri"  °"i°"2  o»aJ 
with  which  we  have  already  dealt.  The  integrations  must  be 
performed  for  both  variables  from  t  and  t'  onwards,  and 
compared  with  the  total.  If  there  are  n  measurements,  it 
becomes  necessary  to  deal  with  a  function  of  n  variables,  and 
this  will  give  the  reader  a  slight  idea  of  the  problem  from  the 
mathematical  point  of  view,  and  suggest  that  he  will  expect 
the  quotient  of  U\o  ?i-fold  integrals  to  give  the  probability. 
The  next  step  is  to  reduce  these  ?i-fold  integrals  to  the  form 
of  ordinary  integrals,  and  it  has  been  shown*  that  the  result 

I    e~&*  xn~ldx 

p_J   X  4- 


I. 


e-},x*  xn    icJx 


is  reached.  In  this  expression  %  stands  for  a  complex 
function  depending  on  the  n  variables  from  which  the 
expression  was  evolved,  and  measures  the  position  that  is 
indicated  by  the  probability  of  the  particular  distribution,  the 
test  for  the  graduation  of  which  is  required. 
4.  Before  a  measure  of  the  probability  P  can  be  obtained,  a 
value  for  ^  must  be  found  from  the  statistics  of  the  particular 
graduation,  and  in  the  paper  to  which  reference  has  already 
been  made  its  value  is  shown  to  be  such  that 

X  ~    \        mr        ) 

It  is  almost  obviously  necessary  to  use  the  square  of  the 
difference  in  order  that  negative  differences  may,  equally  with 

*  Professor  Karl  Pearson :  "On  the  criterion  that  a  given  system  of 
deviations  from  the  probable  in  the  case  of  a  correlated  system  of  variables 
is  such  that  it  can  be  reasonably  supposed  to  have  arisen  from  Random 
Sampling."— Phil.  Mag.,  July,  1900. 

f  A  fairly  extensive  table  of  P  will  be  found  in  Biometrlha,  vol.  i., 
p.  155,  &c.  It  gives  values  of  P  for  all  values  of  n  + 1  from  3  to  30, 
corresponding  to  \2  horn  1  to  30,  with  a  few  additional  values  and  auxiliary 
tables  for  the  calculation  of  further  values. 
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positive  differences,  increase  the  improbability  of  the  system, 
while  a  ratio  is  required  to  bring  into  account  the  size  of  the 
group ;  for  an  error  of  15  in  a  group  of  20  would  be  very 
large,  but  in  a  group  of  1,000  would  be  negligible. 
5.  The  practical  aspects  of  the  test  of  fit  and  its  application 
may  now  be  dealt  with. 

(1)  If  the  facts  representing  the  graduated  and 
ungraduated  figures  are  only  available  in  groups,  then  the  value 
of  the  probability  by  the  test  will,  as  a  rule,  be  lower  as  the 
number  of  groups  is  increased.  This  practical  point  should  be 
borne  in  mind,  as  it  sometimes  happens  that  graduations  are 
tested  in  groups  of,  say,  5  years  of  age;  but  the  graduated 
figures  for  individual  ages  are  then  used  unreservedly,  though, 
strictly  speaking,  they  maybe  no  better  than  interpolated  values. 

(2)  The  test  assumes  a  distribution,  and  would  not  be 
applicable  if  the  numbers  were  a  series  of  ordmates,  though 
the  application  of  the  test  would  probably  give  a  fair  idea  of 
the  goodness  of  fit  if  a  large  number  of  ordinates  had  been 
given  in  the  series. 

(3)  The  tails  of  the  experience  will  be  very  small  and 
never  fit  exactly.  "  We  ought  to  take  our  final  theoretical 
"  groups  to  cover  as  much  of  the  tail  area  as  amounts  to  at 
"  least  a  unit  of  frequency  in  such  cases. "  (Phil.  Mag., 
footnote,  p.  164.) 

(4)  If  the  number  of  observations  be  multiplied  by  /,  sa}^, 
and  the  deviations  are  also  multiplied  by  /,  theu  the 
value  of  %2  will  be  multiplied  by  the  same  figure,  and 
the  test  will  show  that  the  fit  is  worse.  This  may  seem 
strange  at  first,  but  a  little  consideration  Avill  show  that  it  is 
reasonable,  since  a  large  number  of  cases  will  give  smoother 
series  than  a  small  number ;  then,  if  the  results  are  propor- 
tionally the  same  in  two  examples  having  the  same  theoretical 
distribution  but  different  total  frequencies,  it  follows  that 
the  one  with  greater  frequency  is  less  probable  than  the 
one  with  less  frequency.  The  probability  of  a  result  as  bad, 
or  worse,  than  three  heads  and  one  tail  in  coin  tossing  (two 
heads  and  two  tails  being  the  theoretical  result)  is  '625;  but 
the  probability  of  a  result  as  bad,  or  worse,  than  3x2  =  6 
heads  and  1x2  =  2  tails  is  "289. 

(5)  I  have  found,  in  applying  the  test,  that  when  the 
numbers  dealt  with  are  very  large,  the  probability  is  often 
small,  even  though  the  curve  appears  to  fit  the  statistics  very 
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closely.  The  explanation  is  that  the  statistics  with  which  we 
deal  in  practice  nearly  always  contain  a  certain  amount  of 
extraneous  matter,  and  the  heterogeneity  is  concealed  in  a 
small  experience  by  the  roughness  of  the  data.  The  increase 
in  the  number  of  cases  observed  removes  the  roughness,  but 
the  heterogeneity  remains.  The  meaning,  from  the  curve  - 
fitting  point  of  view,  is  that  the  experience  is  really  made 
up  of  more  than  one  frequency-curve;  but  a  certain  curve, 
approximating  to  the  one  calculated,  predominates. 

(6)  It  is  sometimes  thought  that  the  introduction  of 
additional  constants  must  necessarily  improve  the  fit  of  a 
curve.  It  may  do  so  in  some  cases,  but  it  is  quite  possible  to 
take  a  curve  with  ten  constants  and  find  it  gives  a  worse 
result  than  another  having  only  three. 

(7)  It  may  sometimes  be  advisable  to  use  a  curve  giving  a 
slightly  worse  agreement  than  another  for  simplicity,  or  for 
reasons  such  as  those  which  prompt  actuaries  to  employ 
Makeham's  hypothesis  ;  but,  as  a  rule,  it  is  well  to  use  the 
best-fitting  curve  in  any  case,  that  is,  the  curve  giving  the 
highest  value  of  P. 

6.  In  a  recent  paper  "  On  the  Comparative  Eeserves  of  Life 
Assurance  Companies,  &c."  (Journal  of  the  Institute  of 
Actuaries,  xxxvii.,  pp.  458-9),  Mr.  King  remarked  that  it 
is  permissible  to  use  the  HM  Model  Office  for  the  0M ;  and  it 
will  be  interesting  to  apply  the  formulas  given  above  to  see 
what  is  the  probability  of  the  0M  distribution  if  the  HM  be 
taken  as  the  theoretical  distribution  : — 


Central  Age 

Policies  issued  arranged 
in  Age-Groups. 

o^_ 

-HM 

Square  of 
0M-HM 

Group. 

HM 

QM 

+ 

~ 

+HM 

20 

6-97 

7-30 

•33 

•02 

25 

17-75 

20-45 

2-70 

•41 

30 

2101 

23-11 

3-07 

•15 

35 

18-41 

18-40 

•01 

•00 

40 

13-82 

13-05 

•77 

•04 

45 

9-45 

8-44 

1-01 

11 

50 

623 

5-07 

1-16 

•22 

55 

3-51 

2-58 

1-93 

1-60 

60 

1-97 

1-20 

•77 

•30 

65 

•85 

•40 

•45 

•21 

100-00 

100-00 

6-10 

640 

X—3-39 
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There  are  ten  groups,  and  ^2  =  3*39,  and  the  table  in 
Biometrika  gives  P= '964295  and  -911413  when  %2  =  3  and  4 
respectively.  There  is,  however,  a  further  point,  for  it  has  to 
be  decided  if  it  is  sufficient  to  test  for  100  new  policies.  500 
would  reduce  the  probability  to  about  '05,  which  means  that 
in  only  one  case  out  of  twenty  would  a  random  sampling 
lead  to  a  system  of  deviations  from  the  IP1  as  great  as  that 
shown  by  the  (P1.  This  result  will  remind  the  student  of  the 
great  danger  of  dealing  with  percentages  without  considering 
the  actual  number  of  cases  investigated.  Mr.  King's  other 
table,  which  is  of  greater  importance  in  his  work  (policies 
according  to  attained  age),  shows  a  much  closer  agreement, 
as  P  is  '831051  for  10,000  cases. 

In  a  paper  on  Makeham's  formula  {Journal  of  the  Institute 
of  Actuaries,  xxxv.)  Mr.  Calderon  gave  some  graduations  of 
the  HF  mortality  table,  and  on  pp.  188  and  189  his  results 
are  summarized  in  a  form  which  is  convenient  for  applying 
the  test.  His  methods  A,  B  and  C,  give  35*11,  34*02  and 
37*77  as  the  values  of  ^2  for  20  groups.  The  probability  if 
%2  =  30  is  -051798,  and  if  %2  =  40  is  '003272.  The  odds 
against  the  best  of  the  three  graduations  must  be  30  to  1, 
which  shows  that  Makeham's  formula  is  unsuitable  or  the 
methods  of  application  unsatisfactory. 

In  the  numerical  examples  of  Chapter  V.,  the  value  of 
P  for  Type  I.  is  about  *98.  Type  II.  gives  '7,  Type  IV.  f, 
and  the  sums  assured  in  Type  VII.  '2,  and  the  reserves  give 
a  probability  greater  than  '9. 

7.  The  only  other  point  to  which  reference  is  necessary  is  the 
actual  value  of  P  at  which  a  good  fit  ends  and  a  bad  one 
begins.  It  is  impossible  to  fix  such  a  value.  We  have  merely 
a  measure  of  probability  for  the  whole  table,  and  if  the  odds 
against  the  graduation  are  twenty  or  thirty  to  one  the  result 
is  unsatisfactory ;  if  they  are  ten  to  one  the  graduation  is  not 
unreasonable,  but  the  exact  value  when  a  result  must 
be  discarded  cannot  be  given.  As,  however,  it  is  clearly 
impossible  to  imagine  any  test  which  can  fix  an  absolutely 
definite  standard,  there  is  no  reason  for  objecting'  to  the 
particular  method  because  it  fails  to  do  so. 
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CHAPTER   X. 


The  Theory  of  Contingency. 

1.  A  table  showing  no  correlation  can  be  formed  from  the 
totals  of  any  ordinary  correlation  table,  such  as  that 
on  p.  107,  by  dividing  the  total  of  each  column  in 
proportion  to  the  totals  of  the  rows.  Thus,  the  first 
column  would  be — 

Unexpired  Term    ...        0  —  4  5  —  9        ... 

Frequency   with    no  )  ~       56  172 

correlation  ...>°X2870  X  2870     ■     ■     " 

and  the  remaining  part  of  the  table  would  be  formed 
in  a  similar  way.  A  moment's  consideration  of  the 
definition  given  at  the  beginning  of  Chapter  VI.  will 
show  that  such  a  method  of  formation  must  necessarily 
give  the  required  table,  because,  since  each  column  is  formed 
in  proportion  to  the  total,  the  means  of  the  columns  must 
all  be  the  same  as  the  mean  of  the  total,  which  shows  at 
once  from  the  definition  that  no  correlation  can  exist  in 
such  a  table. 

2.  The  following  table  shows  the  figures  exhibiting  no 
correlation  in  ordinary  type,  and  those  exhibiting  correlation  in 
small  type.  Now,  if  these  two  sets  of  figures  coincide  exactly 
in  any  particular  case  there  is  clearly  no  correlation  in  the 
table ;  if  they  differ  slightly  there  is  a  slight  amount,  and  if 
they  differ  greatly  there  is  a  considerable  amount  of  correlation, 
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and  we  come  therefore  to  the  conclusion  that  an  alternative 
method  of  finding  the  correlation  between  two  things  is  by 
measuring  the  difference  between  the  figures  in  the  actual 
correlation  table  and  those  that  would  have  arisen  if  there 
had  not  been  any  correlation.  In  the  last  chapter  we 
discussed  a  method  of  measuring  the  goodness  of  fit  (or 
amount  of  agreement)  between  two  sets  of  figures,  and  this 
suggests  that  we  might  calculate  %2  by  squaring  the  difference 
between  each  pair  of  figures  in  the  table  and  dividing  the 
result  by  the  frequency  when  there  is  no  correlation.  The 
reason  for  choosing  the  figure  from  the  table  with  no 
correlation  as  the  divisor  is  that  it  always  has  a  value,  while 
the  correlation  table  may  give  a  frequency  of  zero,  which 
renders  it  impossible  to  use  the  latter  as  a  divisor. 


Unexpired 

CENTRAX  Ar, 

e  at  Maturity. 

tf  rm  of 
Endowment 

Total. 

Assurances. 

30 

35 

40 

43 

50 

55 

60       65 

70 

75 

0-4 

•i 

2 

•1 

•3 

1-1 

11-4 

26 

12-5 
6 

21-4      7-6 
14            6 

1-2 

•2 

56 

5-9 

•4 

•2 

1-0 

37 

35-0 

38-6 

65-8   23-2 

3-6 

"5 

172 

1 

1 

2 

6 

62 

36 

40           22 

0 

10-14 

•9 

•6 

26 

9-3 

87-8 

96-8 

165-4    58-4 

9~0 

1-2 

432 

2 

9 

17 

117 

99 

127          52 

8 

1 

15-19 

1-4 

•9 

39 

14-4 

1353 

1490 

254-4   899 

13-9 

1-9 

665 

3 

6 

24 

145 

155 

237          S4 

11 

20-24 

1-4 

•9 

4-0 

11-6 

1372 

151-0 

2578   91-1 

14-1 

1-9 

674 

1 

3 

133 

167 

271          7S 

20 

1 

25-29 

11 

•8 

32 

11-6 

109-5 

120-6 

205-9   72-7 

11-2 

1-4 

538 

9 

90 

123 

231          71 

11 

3 

30-34 

'5 

•4 

1-5 

5-3 

50-3 

55-3 

94-6    33-4 

5-2 

•7 

247 

1 

11 

49 

127          49 

8 

2 

35-39 

•2 

•1 

•5 

1-7 

15-7 

1,7-2 

6 

29'4    10-4 
49           22 

1-6 

•2 

77 

40-44 

•0 

•0 

•0 

•2 

1-6 

1-8 
2 -2 

31      1-1 

"2 

•0 

8 

45-49 

'•0 

'■<> 

:o 

•0 

•2 

•4        -2 

1 

•0 

•0 

1 

Total 

6 

4      17 

62 

584 

643 

1,098    388 

60 

s 

2,870 

3.  As  it  is  clear  that  %2  will  give  a  measure  of  the  correlation, 
ir  will  be  interesting  to  sec  the  connection  between  it  and  the 
coefficient  of  correlation  r;  and  the  following  proof  shows  that 


/  *2 


here  <£- 


N 


and  the  correlation  tabic  can  be  approximately  represented 
by  the  normal  correlation  surface. 
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Using  the  same  notation  as  that  of  Chapter  VI.,  the  frequency 
with  no  correlation  is  given  by 

J.Tr(T]<r2 
while  that  with  correlation  is — 

x     I     /  :c-        2rxy       y"  \ 


N 

S= - d? 

^vl_rvl(r2 


Then  ^f+-f+-<f=^^ 

J  —  oo  J  —  oo  -IN  ~  0 

i  r  i    r+«  r+°°  -i/^y^-fsa+y.y^i 


where  ar  =       and  y  —  - 


I 


\  U-f*/  (1-r2)2  V  (l-r2)2  ""  (1-r2)2 


+  1 


by  No.  (vi.)  of  Appendix  III. 

r2 
"1— r2 


or  ? 


"Vr^ 


L    2 
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4.  The  result  just  obtained  may  be  considered  a  little  more 
closely,  as  it  leads  to  some  valuable  conclusions : — 

(1)  It  shows  clearly  that  r  must  lie  between  —1  and 

+  1. 

(2)  Since  the  value  of  <$r  will  not  be  affected  by  the 

order  of  the  columns  (or  rows),  it  will  be  seen  that 
it  is  permissible  to  interchange  them,  provided,  of 
course,  the  whole  column  (or  row)  be  moved  at 
once. 

(3)  The    proof   shows   that   r  will    not    necessarily  be 

obtained  exactly  if  a  very  small  number  of  groups 
is  used,  because  by  using  the  integral  calculus  an 
infinite  number  of  groups  was  assumed. 

(4)  We   also  assumed,  however,   that  we  were  dealing 

with  perfectly  smooth  series;  but  since  %2  is  a 
measure  of  the  goodness  of  fit  between  the 
correlation  and  no-correlation  figures,  a  very 
large  number  of  groups  gives  undue  prominence 
to  the  chance  deviation,  due  to  the  use  of  a 
random  sample,  and  the  value  of  r  found  from 
that  of  cj)2  may  differ  considerably  from  the  value 
reached  by  the  ^-moment.  Too  fine  a  grouping 
may  give  a  less  accurate  result  than  a  less  fine 
one. 

5.  These  conclusions  are  borne  out  by  practical  work,  and 
any  student  who  cares  to  go  into  the  subject  can  find  the 
value  of  r  by  the  two  methods  from  a  large  table,  using 
various  groupings,  and  he  will  see  that  the  best  agreements 
are  obtained  when  the  grouping  is  neither  very  fine  nor  very 
rough.  It  should,  however,  be  borne  in  mind  that  unless  the 
correlation  table  takes  the  form  assumed  in  the  proof,  an 
exact  agreement  between  the  two  methods  cannot  be  expected; 

it  is,  for  this  reason,  well  to   distinguish  the  value  -\/    ^  - 

v  1  +  <p- 

from  the  value  r  by  calling  the  former  the  coefficient  of 
contingency.  It  seems  to  me  that  if  the  difficulty  about- 
grouping  could  be  overcome,  the  coefficient  of  contingency 
would  be  more  useful  than  the  coefficient  of  correlation  (?•). 
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6.  The  following  table  shows  the  table  given  above  grouped 
so  as  to  enable  us  to  obtain  the  coefficient  of  contingency 
more  conveniently  : — 


Unexpired 

Central  Age  at 

Maturity 

term  of 
Endowment 

Total. 

Assurances. 
0-9 

30, 35,  40  &  45 

50 

oo 

60 

65 

70&75 

7-0 

46-4 

511 

87-2 

30-8 

55 

228 

14 

ss 

42 

54 

28 

2 

10-14 

134 

87-8 

96-8 

165-4 

58-4 

10-2 

432 

28 

117 

99 

127 

52 

9 

15-19 

20-6 

135-3 

149-0 

254-4 

89-9 

15-8 

665 

38 

145 

155 

237 

84 

11 

20-24 

20-9 

1372 

151-0 

257-8 

91*1 

16-0 

674 

4 

133 

1G7 

271 

7S 

21 

25-29 

16-7 

109-5 

120-6 

205-9 

72-7 

12-6 

538 

;> 

;>0 

123 

231 

71 

14 

30-34 

7-7 

50-3 

55-3 

94-6 

334 

5-9 

247 

l 

11 

49 

127 

49 

10 

35-49 

2-7 

175 

19-2 

32-9 

11-7 

2-0 

86 

Total 

89 

s 

51 

26 

1 

584 

643 

1,098 

388 

68 

2,870 

Working   out  %2  from  this   table,   the  value   is   found  to  be 

257-9,*  which    gives   <£*=  ^  ='0899,    and    the    coefficient  of 

contingency  is  '2872.  This  differs  from  the  value  of  r  found 
by  the  other  method  by  about  *03  ;  but  an  inspection  of  the 
table  on  p.  107  leads  to  the  conclusion  that  the  totals  caunot 
be  considered  to  be   curves   of   Type  VJL,  and  the   condition 


)rr=Vi5 i 


i+<f 


is  not  therefore  satisfied. 


7.  The  probable  error  of  the  coefficient  of  contingency  may 
be  taken  as  approximately  one  and  a  third  times  that  of  r. 

8.  Though  Ave  have  dealt  with  the  theory  of  contingency 
from  the  point  of  view  of  its  particular  application  to  ordinary 
correlation  tables,  the  reader  should  bear  in  mind  that  in 
statistical  practice  its  chief  use  is  when  characters  not  capable 
of  quantitative  measurement  are  being  examined,  such,  for 
instance,  as  colours,  shapes,  diseases,  &c.  The  method  of 
application  is,  of  course,  exactly  the  same. 


*  In  case  any  student  may  not  follow  the  method  easily,  Ave  may  mention  that 
the  contributions  to  x2  from  the  first  column  are  7'0,  15*9,  7'5,  13*7,  3'6,  5'S, 
and  2 "7. 
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9.  We  may  now,  in  conclusion,  refer  briefly  to  some  of  the 
practical  applications  to  which  the  theories  of  correlation  and 
contingency  can  be  put.  Some  actuarial  applications  have 
already  been  made,  such  as  the  investigation  by  Professor 
Pearson  and  Miss  Beeton  into  the  inheritance  of  duration  of 
life  (see  Journal  of  the  Institute  of  Actuaries,  vol.  xxxv., 
pp.  112,  et  seq.j  and  458,  et  seq. ;  and  Biometrika,  vol.  i., 
part  I.) ;  while  the  correlations  between  ages  of  husband  and 
wife  and  age  at  death  of  a  man  and  the  number  of  his 
children  under  age  21  are  obvious  applications.  The  last- 
mentioned  case  seems  to  suggest  that  it  might  be  possible  to 
apply  the  method  in  connection  with  the  valuation  of  pension 
funds,  while  we  have  already  noticed  that  it  is  possible  that 
it  might  be  of  use  for  checking  average  ages,  &c,  in 
endowment   assurance  valuations. 
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APPENDIX   I. 


USEFUL   CONSTANTS. 


e  =  2-71828  18285 

e-*=  -36787  94417 

tt=314159  26536 

log10e  =   -43429  44820 

loge  10 =230258  509 

log  (logioe)  =1-63778  43114 

log107r=    -49714  98728 

log10\/7r=    -24857  493(34 

log io  -7 —  =160091  00057 

\  '  27T 

log10e-r,  =  r-96380  87932 


152 


APPENDIX   II 


B   AND   r  FUNCTIONS. 


B(m,ri)  =  [   xm~l(l—x)n-ldx 

Jo 

r(1>)  =  \    e-:cxiJ-ld,e 

I.  \x^-le-xdx=i  —e-ar^-1+(p  —  l)\xP-*6-*da: 

by  integration  by  parts. 

"When  p  —  1   is  positive,  e-3^-1  vanishes  when  #=0,  and 

when    a? =00     it    can    be    written    '    ,  -  ,    and    the    rule    for 

evaluation    of    undetermined    forms    (Edwards'    Diff.    Gale, 
ch.  xiv.)  can  be  applied. 

I   xP-le~xdx={p—  1)     aj*""%"*daj 

Jo  Jo 

I\p)=(p-l)T(p-l). 

If  £7  be  an  integer,  JT(^)  =|jp— 1. 


tt  m         r./      x    r{)n)r(n) 

II.  To  prove  B(m,  n)=     *    ;,\- 

Putting  zx  for  #  in  the  equation  for  F(m)3  we  have 


o 


r(m)=  I   c^W"1^ 
Jo 

and  r(m)e--zn~1=  [  e~e  l+x}zm+n~ldx 

Jo 
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But  if  g(l  +  x)  =ij,  we  get 

f  T  Cx 

p-zX+xtym  +  n-XJy— I     fjUuiii+ii-LI,, 


r(m)r(n)=r(m  +  n)       „        x    ^   d* 
But  putting  l+#=  ■- — -  in  this  integral,  we  obtain 

- fl_  z)m+n t dz 

which  reduces  at  once  to  B(ra,  n),  and 

III— To  prove  r(4)=VV. 

We  have  already  shown  in  the  proof  for  Type  VII.  that 

2     e~''dx-=-\/ir,  and  by  putting  x2  =  z,  we  have 

Jo 

2 1  e-*"<fc=f  e-*a-*<fe=r(i)=vV. 

Jo  -'0 

For  statistical  work  a  table  of  r(^)  or  logr(A')  is  required, 
and  Legendre  has  given  a  table  of  the  latter  to  twelve  figures 
for  values  of  x  between  1  and  2,  from  which  logT^-)  can  be 
found  easily,  provided  x  is  small. 

When  x  is  large,  logT^')  can  be  approximated  to.  The 
best  known  approximation  is 

V(x  +  1)  =\/2irxxx6-xe  ~  ux  * 
or 

log  10  r  (as  + 1)  =log  10  s/2ir  +  {as  +  !j)  log  10  x  —  (x  +  ~\  log  10  e 

and  it  can  be  used  when  x  is  not  less  than  8.     To  show  how 
the  table  of  logF^')  is  used,  and  also  how  the  approximation 

*  A    proof   of  this  well-known    approximation   will   be  found   in  Chry&taVa 
Algebra,  vol.  ii.,  pp.  308,  &c,  or  in  Boole'a  Finite  Difference*,  chapter  VI. 
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approaches    the    true    value,    the    following    table    has    been 
prepared  : — 

Table  comparing  True  and  Approximate  Values  of  logT(x). 


Log  r(a?) 

1-372 

True. 

Approximate. 

A 

1-948975 

2372 

•086329 

•086743 

•000414 

3372 

•461444 

•461532 

•000088 

4-372 

•989332 

•989369 

•000037 

5-372 

1-630012 

1-630025 

•000013 

6-372 

2-360148 

2-360156 

•000008 

7-372 

3-164424 

3164430 

•000006 

8-372 

4-032009 

4-032010 

•000001 

9-372 

4-954838 

4-954837 

-•000001 

10-372 

5-926670 

5-926669 

-•000001 

A 

Legendre 


six-figure 


table    of    loglX^),    obtained    by 


abridging 


is  given  on  pp.  166  and  167. 
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APPENDIX  III. 


The  Integration  of  some  Expressions  connected  with 
the  Normal  Curve  of  Error. 

1.  On  page  91  we  showed  that 

e~x'dx=.  V7r (i.) 

en 

I        e-^ih'dx  =  Ji^7r (ii.) 

J  — °° 

Since  the  curve  is  symmetrical,  we  have 

1"+  CO 
x2n+le-x* fix ,_  zero (iii.) 
—  00 

If  we  integrate  \x2tle~  '~dx  by  parts,  we  have 

I^n  +  l  r  j.-ji/i! 

x2ne~x*dx=  n  «-*"+  \2x  —    -  e-xidx 

2»  +  l  J      2»+l 

and  inserting  the  limits  —  x    and  +  x   we  have 

r+»  2    r+°c 

x2ne-x*dx=  - — — :         x2,l+2e-zadx     .     .     .     (iv.) 
J  -a  2»-r-lJ_x  v     J 

This   last   formula    shows    the    connection    between    the    successive 
even  moments. 

2.  Referring  to  p.  112,  let 

_        1        r^_2gyr    .    ^    j 

Then  2!  can  be  put  in  the  form 

1_        f  £_  _  #£  j.2       _         1         .  //-'(I-/'-) 

Z0e       2(1-,-)  U,       a  J      e       2(1-/-)  <r,- 

1  (        Hi-ax  "I  "       _  jr_ 

=  Z0e       2(l-i-=>x*  I  cr,     '      e       2<r2a 


Then 


j       {gdx=z0^27r{l— r^o-ie-y'l2"**  by  (ii.) 

2dfo?d^=         ^0v2tt(1  —  r-)<nc-ii-i-<r*"-dy 

— 00  J  —  oo  J  —  oo 


27ro-1o-o  v^l — r~~0 (v.) 
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3.  Using  the  same  method  as  that  just  given,  it  can  be  shown  that 
if  ac> I'2 

f+OD      f+«  2- 

I  g-KaaB-2toy+cy3)^^==       .      .      .      (vi.) 

for  the  index  can  be  written 

-2Vl-^}+2«("t'-i) 
and  if  we  then  integrate  with  respect  to  x,  we  have 

and  if  ac —  b2  is  positive,  we  can  integrate  this  last  expression  with 
respect  to  y  and  have 


411 


ac—b-        \/ac—b2 
■+x  r+x 


4.  We  will  now  find  zxydxdy 

J  —  ao  J  —  ao 

Proceeding  as  in  (v.),  we  have 

r+oo  r+co 

zxydx=         xyz0e 

J  -X,  J  —  oo 


where  X=a? — ■ 


x- 

Jx 


3la    .  i  ■  "  y/'tri    _ 

:0(yc--'- -^"         "'        e     2(i-r=)c 

J  -  «    °2 

r+o° 

because  by  (iii.)  e~x~ !  XdX  is  zero 


=*o     -  rV2ir(l— r2)fe-y:l2*'-'- 
But,  by  putting  »=0  in  (iv.)  and  using  (ii.),  we  have 

j. 

j    *  |        zxydxdy=Ztfrfrr£2vr  A— r2 

=  ^S(Tl(T2r (vii.) 

because  by  (v.) 

N 


'°"2^2^P 
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5.  We  may  now  deal  with  the  problem  referred  to  in  Chap.   VII — 
"  To  find  a  value  of  r  from  the  equation 

2irvl — r2j7i  Ja- 
where  rf,  N,  h}  and  &  are  known." 

Consider  the  expression   -  , e-fo2+.v"  -2rey)/2(i  -r-)— XJ  say  .  .  (a) 

and  expand  it  in  terms  of  r  by  Maclaurin's  theorem,  then 

/dnXJ\ 

where     Wm=p^8+//!)^_ J  ^ (7) 

Now   take    the   logarithmic    differential    coefficient     of     U   with 
respect  to  r,  and  we  have 

ldV  (pt?+y2—2rxy)  d 


U  dr 


dV 


dr 


(I  — r2)-l    J 


,/r 


=  _(^  +  //2_2nry)(l_r2)-2+(1_r2)-liry  +  ;.(1_ro)-l 

==U{1— #*}-^+r(l— ^— y^H-r^—i*} 


Differentiate  »  times  by  Leibnitz's  theorem  and  put  >*  =  0  and  we 
have 

un+i=n(%n~  1— ar2— y*)un-\— n(n— l)(*i— 2)2ww_3 

+  ;ry  (ww.  +  w  O  —  1)  ?*„ ._  2) 
Hence  «^o=I 


ux—xy 

^=(<r2_l)0/2_l) 

w4  =  (#*  -  6#a  +  3)  <y  -  6/  +  3) , 
The  laws  indicated  by  (8)  are 

t'„ = xv  n  _ ,  —  («  —  1 )  y„  _2 
wn=yivn-i—  (n—l)wn_2 
and  we  can,  therefore,  re- write  (j3)  as 


(8) 


....(€) 


Ltj     ie-^!(1+^v+^+...) 

7T  Z7T  1!  ^1 


(?) 
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integrating  this  from  k  to  oo  with  respect  to  .r,  and  remembering  that 
wn  does  not  involve  xt  we  have 

i  htt  7     i    lJ  h  i  -  7    «i  r  ,  »  7 

^7rJ  /,  1-rr  I  J  /,  1 !  J  ;, 


+  r»-=     e-h*2vndx+.  .  . 
w !  J  /,  J 

=  i    -^fv.+  ^+...+^+..A 

v^tt  [  1!  n\  | 

1     fM 

where  V„  is  written  for       vne-*'~dx. 

v2irJJi 

Now  integrate  this  with   respect  to  y  from  &  to  oo  ,  remembering 

that  Yn  does  not  involve  y,  and  writing  W w ,  for      , e-%y*wndy, 

i  r°°  r 

we  see  that  — -  TJdxdy  can  be  expressed  as  a  series  of  which  the 

general  term  is  —  V,4W»,  and  we  must  now  evaluate  VOT  and  W^. 
n\ 

From  (e)  it  can  be  shown  by  induction*  that  the  general  form 

of  vn  is 

v      n(n-l)    xn~2      n(n-l)(n-2)(n-3)    xn~4 
x  j-|  —  +  — — —  —  -  &c.     .     .     (7?) 


Now  we  notice  that 

dvn 


-j-  —nvn-i 
ax 


m 


•  by  (c)  »»=*»«_!■ 


dx 

Multiply  by  e~&*  and  integrate 


\e-fr*vndxz=  j 


#e ■-■'  vn-idx —  | e-**  — - —  ax 


*^tlie  proof  is  as  follows  :  — 

1,„,-(«-1),1.^.,»-("-]](,"-2)-',":;?  +  ^-^-^-^-*>.^\., . 

-(,,_!).,.,.■  =+  (»-l)(»-2)(»-3).,^_    __ 

//(»-])    arn --       n(n  —  l)(n-2)(n-3)    xn~4 
-*"  ji  g     +~         -2-—        -.-^--... 
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and  integrating  the  latter  integral  by  parts  we  have 

Now,  writing  H  for    -=  e-^1',  and  K  for    -=  r-~^:\  we  have  from  (a) 


V2w  V2tt 


d 

N 


27rJ /,  J fc  \\n\  J 

(b  +  d)(c  +  d)      *(rn  nwyF,         .  N       \ 

=  * ^V^  +s(-!Hk(,H_1),=;l0^-i)^,J 

or  remembering  that  N  =  a  +  h-\-c  +  J,  we  write 
W—  be       "/**,  \ 

=  ,•+  'J  &*+  ^  (A»- 1)(*»-1)  +  £  *(**-3)*(**-3) 


+  ^7*(^-10A2  +  15W^-10P+15) 

+  &c (viii.) 


160 


APPENDIX   IV. 


Alternative  Systems  of  Frequency-Curves. 

This  Appendix  deals  with  some  of  the  systems  of  frequency-curves 
which  have  been  suggested  instead  of  those  of  Chapter  IV,  Objection 
to  Pearson's  system  has  not  been  generally  directed  against  its 
practical  sufficiency,  but  has  rather  been  founded  on  the  contention 
that  the  theoretical  basis  of  the  system  is  insufficient.  It  is 
interesting  to  note  in  this  connection  that  in  a  paper  read  before 
the  Statistical  Societ}^  this  year,  Professor  F.  Y.  Edgeworth,  who 
has  himself  suggested  other  methods,  points  out  that  Pearson's 
Generalized  Probability'  Curve  appears  more  justifiable  in  the  matter 
of  a  priori  justification  the  longer  its  philosophic  basis  is  subjected 
to  criticism.  With  these  prefatory  remarks,  we  may  turn  to  the 
suggested  methods. 

I.  Method  of  Translation. — As  the  normal  curve  has  an  approved 
theoretical  basis,  graduation  might  be  effected  by  using  y=«-t/W3'. 
This  merely  conceals  the  use  of  an  absolutely  general  expression,  and 
one  still  requires  to  know  what  forms  are  best  for  f(jx) .  It  is  hard 
to  see  why  the  normal  curve  should  be  held  to  be  anything  more  than 
a  first  approximation  to  a  general  result.  For  a  fuller  account  of 
this  method,  the  reader  may  refer  to  F.  Y.  Edgeworth,  Journal  of 
Statistical  Society,  vol.  lxi.,  pp.  675-689,  or  J.  C.  Kapteyn,  "  Skew 
Frequency-Curves  in  Biology  and  Statistics,"  Groningen,  1903. 

II.  The  use  of  half  one  normal  curve  for  positive  and  half 
another  normal  curve  for  negative  frequencies. —  Obviously,  there  is 
no  theoretical  basis,  for  each  separate  normal  curve  has  its  meaning 
based  on  certain  assumptions,  but  the  two  parts  become  meaningless. 
The  use  of  a  part  of  a  normal  curve  for  a  complete  series  is  empirical ; 
our  own  use  of  part  of  one  on  p.  95  is  so  too,  but  we  did  not  adopt 
it  for  actual  curve  fitting,  but  as  a  hypothetical  series  of  numbers. 
The  method  cannot  give  suitable  curves  for  graduating  the  examples 
of  Type  II.  or  Type  III.,  nor  a  curve  rising  abruptly  from  the 
axis  of  x. 

III.  The  use  of  the  series  y=A0<£(>)  +  A3</>"'0r)  +  A^iv(»  +  ... 
where   £(#)= — j=  e-C*-'')-/^-. — The    curve    has    been    given    by 

Edgeworth,  Camh.  Phil.  Trans.,  vol.  xx,  pp.  36-65,  113-141, 
C.  Y.  L.  Charlier,  "Researches   into    the   Theory    of   Probability," 
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Meddelanden  fran  Lunds  Astronomiska  Observatorium,  Lund,  1906, 
and  T.  N.  Thiele,  "Theory  of  Observations,"  London,  1903,  either  in 
this  form,  or  as 

y=+(*){l  +  08[(*-ft)«-8e(*-ft)i 

+  04[(a:—  iy-Qc(x— by+3cP]+  .  .  .}* 


which  might  be  developed  as   t/  =  <f>(.v)  f,cf0Jralx-{- a.2.v- .  .  .]-.     Charlier 
uses  the  method   of  moments  for  fitting  the   curve  and,  using  our 

notation,  6=/*,  ,  cr-=fJ>.2i  As= —  ,  A 4= — — —   &c.  ;  he  gives 

tables   of    o-</>(V),  &*<}>'"  (x)    and  cr3</>lv(\r),  and  writes  his  formula  as 

An  example  will  be  of  help.     Taking  our  example  of  Type  IV., 

N 


we    find    the     mean 


44  5772339,     o-=2'127818, 


=43020, 


^3 
<733! 


=  •012208    and 


Hi-") 


'007079,  and    using    Charlier's 


tables,  we  obtain  the  following:  — 


Central 
Age 

X 

.r- 44-57723  1 

First 

5  a 

Term. 

5 

-3-7200 

•0004 

10 

-  3-2500 

0020 

15 

-2-7800 

•0084 

20 

-  2-3100 

•0277 

25 

-1-8401 

•0734 

30 

- 1-3701 

•1561 

35 

-    -9002 

•2661 

40 

-    -4302 

•3637 

45 

+    -0397 

•3986 

50 

•5097 

•3503 

55 

•9797 

•2468 

60 

1-4496 

•1394 

65 

1-9196 

•0632 

70 

2-3895 

•0229 

75 

2-8595 

0067 

80 

3-3295 

•0015 

85 

3-79S4 

•0003 

90 

43693 

•oooo 

Second 

Term. 


+  •0002 
+  -0006 
+  -0013 
+  •0018 
+  -0006 

-  0030 

-  -0064 

-  -0054 
+  -0006 
+  0060 
+  -0060 
+  -0022 
-•0010 
-•0018 

-  -0012 

-  -0005 
-0001 

-  -oooo 


Third 
Term 


•0003 
•0007 
•0010 
•0001 
•0030 
•C052 
•0023 
•0050 
•0084 
•0037 
•0032 
•0047 
•0025 
•0002 
•C010 
•0007 
•0003 
•0001 


sum  of  three 

previous  cols. 

multiplied  by 

4302-0. 


4 

14 

46 

126 

306 

637 

1,108 

1,563 

1,753 

1,548 

1,075 

589 

256 

92 

29 

7 

2 

1 


9,156 


•  Thiele  obtains  the  equation  by  writing  e-<*-6>'  ^' =  e-lyl  -^e^'^'e'^'l2^ , 
and  then  expands  the  last  term  by  Madaurin's  theorem.  Charlier  in  "  Ueber  das 
Fehlergesetz,"  Arkiv  for  Matematik,  vol.  ii.,  Stockholm,  1905,  adopts  a  method 
which  follows  that  of  Laplace.  EdgeAvorth  gives  more  than  one  method  of 
leaching  the  formula. 
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This  graduation,  which  shows  the  method  in  a  suitable  application, 
gives  a  result  which  is  less  probable  than  the  Type  IV.  graduation  as 
judged  by  the  test  in  Chapter  IX.,  though  the  difference  is  entirely 
due  to  the  bad  agreement  in  the  age  5  group.  With  the  Type  II. 
example  the  equation  is 

^  =  12M-4{o-^(.i)--0081o-VOr)--01882o-5^>iyOr)} 

which  gives  a  fair  result  that  is  improved  into  an  excellent  one  by 
omitting  the  middle  term,  but  negative  frequencies  arise  (see  below). 
The  formula  fails  to  graduate  distributions  such  as  our  example  for 
Type  I.  or  Example  I.  of  Table  I.,  and  for  these  cases  Charlier 
suggests  the  use  of  the  series 

IV.  y  =  FGr«-  +  c)=B0tf.r)  +  B1A^(.r)+B2A2f(lr)+  .  .  . 
where 


e  - A  sin  irx  |~1  A,  'V2 

)  +  2!(>-2) 


e  - A  sin  tcx  rl  A- 


erx\* 


when  x  is  a  positive  integer  (cf.  Thiele,  Theory  of  Observations, 
p.  21).  Charlier' s  fitting  of  this  curve  is  arbitrary  as  he  gives  four 
methods  of  solution  according  as  we  assume  certain  values  for 
u\  c,  B0,  &c.  Thus  one  solution  is  based  on  w=l  and  c  =  0,  while 
another,  by  assuming  B!  =  B2=B3=0  finds  A,  w,  and  c.  He  only 
gives  two  examples  because  two  points  in  connection  with  its 
application  have  still  to  be  cleared  up  ;  a  third  point  to  which  he 
does  not  refer  is  that  a  statistical  criterion  depending  on  the  moments 
is  required  to  show  which  series  is  to  be  used  and  which  solution 
of   IV.  is  to  be  taken. 

Apart,  however,  from  these  points  the  use  of  a  series  seems  open 
to  many  objections  ;  for  if  one  of  the  later  coefficients  should  have 
a  large  value  the  neglect  of  later  terms  may  involve  considerable  error 
while  the  higher  moments  which  are  necessary  to  find  these  coefficients 
are  untrustworthy  owing  to  their  large  probable  errors.  The  use  of 
a  limited  number  of  terms  of  such  series  as  those  suggested,  may 
lead  to  negative  frequency  which  is  objectionable  from  the  practical 
point  of  view  and  hard  to  reconcile  with  sound  theoretical  treatment. 
Edgeworth's  well-known  curve 


is  merely  the  first  two  terms  of  the  series  III.,  and  its  inability 
to  graduate  distributions  having  considerable  skewness  is  rather 
accentuated  by  Charlier's  recent  work  in  which  he  often  finds  the 


next  term  of  the  series  significant. 
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APPENDIX  V. 


BOOKS,  REFERENCES,  Ac. 


The  following  list  covers  the  principal  papers  to  which  reference  has 
been  made,  though  it  is  by  no  means  a  complete  bibliography.  It 
will,  however,  be  found  to  cover  those  papers  most  likely  to  prove  of 
value  or  interest  to  actuarial  students. 

THEORETICAL  PAPERS,  &c. 
Biometrika  (Editorials)  : — 

"  On  the  Probable  Errors  of  Frequency  Constants."     Biom., 

vol.  ii.,  pp.  273,  et  seq. 
"  Elementary  Proof  of  Sheppard's  Formula?,  &c."     Biom., 

vol.  iii.,  pp.  308,  et  seq. 

Blakeman,  J. : — 

"  On  Tests  for  Linearity  of  Regression  in  Frequency 
Distributions."     Biom.,  vol.  iv.,  pp.  332,  et  seq. 

Blakejiax,  J.,  and  Pearson,  K.  :  — 

"  On  the  Probable  Error  of  Mean  Square  Contingency." 
Biom.,  vol.  v.,  pp.  191,  et  seq. 

Davenport,  C.  B. : — 

"  Statistical  Methods."  New  York:  John  Wiley  &  Sons; 
London:   Chapman  &  Hall,  1904. 

Galton,  F. : — 

"  Correlations  and  their  Measurement."  Proc.  Boy.  Soc, 
vol.  xlv.,  pp.  136-145. 

Pearson,  Karl: — 

"  Skew  Variation  in  Homogeneous  Material."  Phil.  Trans. 
A.,  vol.  clxxxvi.,  pp.  343,  et  seq.,  and  a  supplement 
in  Phil.  Trans.  A.,  vol.  cxcvii.,  pp.  443-459. 

"Regression,    Hereditary   and    Panmixia."      Phil.   Trans. 

A.,  vol.  clxxxvii.,  pp.  253-31S. 
"  On  a  Form  of  Spurious  Correlation  which  may  arise  when 

Indices    are    used,   &c."      Proc.    Boy.    Soc.,   vol.    lx., 

pp.  489-498. 
"  On  the  criterion  that  a  given  system  of  Deviations  from 

the  Probable  in  the  case    of   a   Correlated  System  of 

Variables  is  such  that  it  can  be  reasonably  supposed  to 

have    arisen  from   Random    Sampling."     Phil.  Mag., 

July,  1900. 
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Pearson,  K_u<l  (continued): — 

"  On  the  Correlation  of  Characters  not  quantitatively 
measurable."      Phil.  Trans.  A.,  vol.  cxcv.,  pp.  1-17. 

"  On  the  Lines  and  Planes  of  Closest  Fit  to  Systems  of 
Points  in  Space."      Phil.  Mag.,  November,  1901. 

"  On  the  Mathematical  Theory  of  Errors  of  Judgment,  with 

special  reference   to   the   Personal  Equation."     Phil. 

Trans.  A.,  vol.  cxcviii.,  pp.  235-299. 
"  On  the  Influence  of  Natural  Selection  on  the  Variability 

and     Correlation    of    Organs."         Phil.     Trans.     A., 

vol.  cc,  pp.   1-66. 

"  On  a  General  Theory  of  the  Method  of  False  Position." 
Phil.  Mag.,  June,  1903. 

"  On  the  Theory  of  Contingency  and  its  relation  to 
Association  and  Normal  Correlation."  Drapers 
Company  Research   Memoir:  Dulau  &  Co.,  1901. 

"  On  the  General  Theory  of  Skew  Correlation  and  Non- 
linear degression."  Drapers"  Company  Research 
Memoir:  Dulau  &  Co.,  1905. 

"  On  the  Systematic  Fitting  of  Curves  to  Observations  and 
Measurements."  Biom.,  vol.  i.,  pp.  265,  et  seq.  ;  and 
vol.  ii.,  pp.  1,  et  seq. 

PEABSOlf,    KaEL,    AM)    FlLON,  L.  X.  Gr.  :— 

"  On  the  Probable  Errors  of  Frequency  Constants  and  on 
the  Influence  of  Pandom  Selection  on  Variation  and 
Correlation."    Phil.  Trans.  A.,  vol.  exci  ,  pp.  229-311. 

Sheppard,   W.    F.  :  — 

"  On  the  Application  of  the  Theory  of  Error  to  Case>  of 
Normal  Distribution  and  Normal  Correlation."  Phil. 
Trans.  A.,  vol.  excii..  pp.  101-167. 

'•  On  the  Calculation  of  the  Most  Probable  Values  of  the 
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Peofessoe  Peaeson  informs  us  that  Tie  hopes  to  publish  very  shortly  a 
volume  of  copyright  tables  for  the  use  of  statisticians,  and  it  has 
therefore  been  decided  not  to  include  any  tables  in  this  volume 
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Sherrard,  W.  F. : — 
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pp.  174,  et  seq. 

There  are  also  tables  of  the  following  functions  in  Davenport's 
';  Statistical  Methods  "  :  — 

Smaller  tables  than  Sheppard's  of  the  Normal  Curve  of  Error. 
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Table  oflogT(p). 


p 

0 

1 

2 

3 

4     5     6 

7     8     9 

1-00 

9750 

9500 

9251 

90U3   8755   8509 

8263  |  8017   777:J» 

1-01 
1-02 
1-03 

1-99   7529 
1-99   5128 
1-99   2796 

7285 
4892 
2567 

7043 
4656 
2338 

6801 
4421 
2110 

6560  |  6320   6080 
4187   3953  i  3721 
1883   1656   1430 

5841 
3489 
1205 

5602  i  5365 
3257  3026 
0981   0757 

1-04 
1-05 
1-06 

1-99  0533 
1-98  8338 
1-98  6209 

0311 

8122 
6000 

0089 
7907 
5791 

9868 
7692 
5583 

9647  I  9427   9208 
747S   7265   7053 
5376  1  5169  1  4963 

8989 
6841 
4758 

8772   8554 

6629  6419 
4553  4349 
2541  2344 
0594  0403 
8710   §525 

1-07 
1-08 
1-09 

1-98   4145 
1-98   2147 
1-98  0212 

3943 
1951 
0022 

3741 
1755 
9833 

3539 
1560 
9644 

3338   3138 
1365  -  1172 
9456  |  9269 

2939 
0978 
9082 

27-10 
0786 
8896 

1-10 

1-97   8341 

8157 

7974 

779  L 

7610  |  7428 

7248 

7068 

6883   6709 

1-11 
112 
1-13 

1-97   6531 
1-97   4783 
1-97  !  3096 

6354 
4612 
2931 

6177 

4441 
2766 

6000 
4271 
2602 

5825  1  5650 
4101  !  3932 
2438  (  2275 

5475 
3764 
2113 

5301 
3596 
1951 

5128  4955 
3429  3262 
1790  1629 

1-14 

1-15 
1-16 

1-97   1469 
1-96  i  9901 
1-96  \   8390 

1309 
9747 
8243 

1150 
9594 
8096 

0992 
9442 
7949 

0835 
9290 
7803 

0677  i  0521 
9139  :  8988 
7658  7513 

0365 

8838 
7369 

0210  0055 
8688  8539 
7225  7082 

117 
118 
1-19 

1-96  1  6939 
1-96  ;  5544 
1-96   4205 

6797 
5408 
4075 

6655 
5272 
3944 

6514 
5137 
3815 

6374  |  6234  !  6095 
5002  1  4868  1  4734 
3686  |  3557  |  3429 

5957   5818   568  L 
4601  i  4169   4337 
3302   3175   3048 

1-20 
1-21 
1-22 
1-23 

1-96  i  2922 

2797 

2672 

2548 

2425  i  2302 

2179 

2057   1936   1815 

1-96   1695 
1-96  0521 
1-95   9401 

1575 
0407 
9292 

1456 
0293 
9184 

1337 

0180 
9076 

1219  :  1101 
0067   9955 
8968  i  8861 

0984 
9843 
8755 

0867  1  0751   0636 
9732   9621   9511 
8649   8544   S439 

1-24 
1-25 
1-26 

1-95   8335 
1-95   7321 
1-95   6359 

8231 
7223 
6267 

8128 
7125 
6173 

80^5 
7027 
6081 

7923  |  7821 
6930  !  6834 
5989  j  5898 

7720 
6738 

5807 

7620  i  7520   7420 
6642   6547   6453 
5716  j  5627  5537 

1-27 
1-28 
1-29 

1-95   5449 
1-95  4589 
1-95   3780 

5360 
4506 
3702 

5273 
4423 
3624 

5185 
4341 
3547 

5099  !  5013 
4259   4178 
3470  !  3394 

4927 
4097 
3318 

4842  '  4757   4673 
4017   3938   3858 
3243  |  3168  3094 

1-30 

1-95   3020 

2«.>4  7 

2874 

2802 

2730  |  2659 

2588 

2518  j  2448   2379 

1-31 
1-32 
1-33 

1-95   2310 
1-95   1648 
1-95   1035 

22-12 
1585 
0977 

-174 
1522 
0918 

2106 
1459 
0861 

2040  |  1973 
1397  :  1336 
0803   0747 

1907 

1275 
0690 

1842   1777   1712 
1214   1154   1094 
0634  0579  0524 

1-34 
1-35 
1-36 

1-95   0470 
1-94  9951 
1-94  9480 

0416 
9902 
9435 

0362 
9853 
9391 

0309 
9805 
9348 

0257 
9757 
9304 

0205 
9710 
9262 

0153 
9663 
9219 

0102   0051   0001 
9617   9571   9525 
9178  9136   9095 

137 
1  1-38 
I  1'39 
!  1.40 

1-94  9054 
1-94  8676 
1-94  8342 

9015 
8640 
8311 

8975 
8605 
8280 

8936 
8571 
8250 

8898 
8537 
8221 

8859 
8503 
8192 

8822 
8470 
8163 

8785   8748   8711 
8437  i  8105   8373 
8135   8107   8080 

1-94   SU53 

8026 

8000 

7975 

7950  |  7925  j  7901 

7877   7854   7831 

1-41 
1-42 
143 

1-94   7808 
1-94  760S 
1-94  7451 

7786 
7590 

7438 

7765 
7573 
7425 

7744 
7556 
7413 

7723  !  7703  |  7683 
7540   7524   7509 
7401  |  7389  7378 

7664  7645  i  7626 
7494  7479  1  7165 
7368   7358   7348 

1-44 
1-45 
1-46 

1-94  7338 
T94   7268 
1-94  7240 

7329 
7263 
7239 

7321 
7259 
7239 

7312 
7255 
7240 

7305  .  7298  7291 
7251   72  48   7246 
7241   7242   7243 

7284  7278   7273 
7214   7242   7241 
72  \b      7248   7251 

1-47 

1-48 
1-49 

1-94   7254 
1-94  7310 
1-94  7407 

7258 
7317 

7115) 

7262 
7326 
7431 

7266 
7334 

711  1 

7271 
7343 
7157 

7277   7282 
7353   7363 
7  474   7485 

7289  '  7295   7302 
7373   7384  7395 
7499  7514  -7529 

P 

0 

1     2 

3 

4     5     6 

7     8     9 
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Table  of  log  r(p) — continued. 


p 

0 

1   |   2 

3 

4     5 

6 

7 

8 

9 

1-50 

1-94   7545 

7561  |  7577 

7594 

7612  |  7629 

7647 

7666 

7685 

7704 
7919 
8174 
8468 

1-51 

1-52 
1-53 

1-94   7724 
1-94   7913 
1-91  8201 

7744 

7967 
8229 

7761 
7991 
8258 

7785 
8016 

8287 

7806 
8041 
8316 

7828 
8067 
8316 

7850 
8093 
8376 

7873 
8120 
8406 

7896 
8146 
8437 

1-54 

1'55 
1-56 

1-94   8500 
1-94  !  8837 
1-94  9211 

8532 
8873 
9254 

8561 
8910 
9294 

8597 
8946 
9334 

8630 
8983 
9375 

8664 
9021 
9117 

8698 
9059 
9458 

8732 
9097 
9500 

8767 
9135 
9543 

8802 
9174 
9586 
0035 
0522 
1047 

1'57 
1-58 
1-59 

1-94 

1-95 
1-95 

96^9 
0082 
0573 

9672   9716 
0130  !■  0177 
062  4  0676 

9701 
0225 
0728 

9806  j  9851 
0274  0323 
0780  0833 

9896 
0372 
0886 

9912 
0422 
0939 

9989 
0472 
0993 

1-60 

1-95  !  1102 

1157   1212 

1268 

1321  1  1380 

1437 

1494   1552 

1610 

1-61 
1-62 
1-63 

1-95  j  1668 
1-95   2271 
1-95  j  2911 

1727   1786 
2333   2396 
2977   3013 

1815 
2159 
3110 

1905 
2522 
3177 

1965 
2586 
3244 

2025 
2650 
3312 

2086   2147 
2715   2780 
3380   3449 

2209 
2845 
3517 

1-64 
1-65 
1-66 

1-95  !  3587 
1-95   4299 
1-95   5047 

3656   3726 
4372   4416 
5124  1  5201 

3797 
4519 
5278 

3867 
4594 
5356 

3938 
4668 
5434 

1010 
4743 
5513 

4081   4151 
4819   4894 
5592  j  5671 

4226 
4970 
5740 

1-67 
1-68 
1-69 

1-95  '   5830 
1-95   6649 
1-95  !  7503 

5911  1  5991 
6733  j  6817 
7590  7678 

6072 
6901 
7766 

6151   6235 
6986  !  7072 
7854  7943 

6317 
7157 
8032 

6400   6482 
7213   7322 
8122  j  8211 

6566 
7416 
8301 

1-70 

1-95  |  8391 

8182  |  8573 

8661 

8756  ,  8848 

8911 

9034   9127 

9220 

1-71 
1-72 
1-73 
1-74 
1-75 
1-76 
1-77 
1-78 
i  1  79 

1-95   9311 
1-96   0271 
1-96   1262 

9409  |  9502 
0369  !  0467 
1363   1464 

9598 
0565 
1566 

9693   9788 
0664  i  0763 
1668  !  1770 

9881 
0862 
1873 

9980 
0961 
1976 

0077 
1061 
2079 

0174 
1162 
2183 
3238 
4326 
5447 

1-96  2-87 
1-96   3315 
1-96  4436 

2391  1  2496 
3453  |  3561 
4547   1659 

26U1 
3669 
4770 

2706  |  2812 
3778  3887 
4882  1991 

2918 
3996 

5107 

3024 
4105 
5220 

3131 
4215 
5333 

1-96   5561 
1-96   6718 
1-96   7907 

5675   5789 
6835  i  6953 
8028   8149 

5901 
7071 
8270 

6019   6135 
7189  I  7308 
8392  8514 

6251 
7127 
8636 

6367 

7547 
8759 

6484 
7666 

8882 

6600 
7787 
9005 

P80 

1-96   9129 

9253   11377 

9501 

9626   9751 

9877 

0003  0129 

0255 

1-81 
1-82 
183 

1-97   0383 
1-97   1668 
1-97  i  2985 

0509   0637 
1798   1929 
3118   3232 

0765 

2060 
3386 

0893   1021 
2191   2322 
3520  !  3655 

1150 
2454 

3790 

1279   1408 

2586   2719 
3925   4061 

1538 

2852 
4197 

1-84 
1-85 
1-86 

1-97   4333 
1-97   5712 
1-97  7123 

4470  4606 
5852   5992 
7266  1  7408 

4741 
6132 
7552 

4881 
6273 
7696 
9119 
0633 
2117 

5019 
6111 
7810 

5157 
6555 
7984 
9443 
0933 
2453 

5295   5434 
6697   6838 
8128   8273 

5573 

6980 
8419 

1-87 
1-88 
1-89 

1-97  ;  8561 
1-98  0036 
1-98   1537 

8710  8856 

0184   0333 
1689   1841 

9002 
0483 
1994 

9296 
0783 
2299 

9591   9739 
1084   1234 
2607   2761 

9887 
1386 
2915 

1-90 

1-98  ;  3069 

3224   3379 

3535 

3690 

3816 

4003 

4159  |  4316 

4474 

1-91 
1-92 
1-93 

1-98   4631 
1-98   6223 
1-98  7814 

4789   4947 
6383  i  6541 
8007  8171 

5105 
6706 
8336 

5264 

6867 
8500 

5423 
7029 
8665 

5582 
7192 
8830 

5742   5902 
7354  7517 
8996  i  9161 

6062 
7680 
9327 

194 
1-95 
196 

1-98 
1-99 
1-99 

9191 
1173 

2S81 

9660  9827 
1343   1512 
3051   3227 

9995 
1683 
3399 

0162 
1853 
3573 

0330 
2024 
3746 

0498 
2195 
3920 

0666 
2366 
4094 

0835 
2537 
4269 

1004 
2709 
4143 

1-97 
1-98 
1-99 

P 

1-99 
199 
1-99 

1618 
6381 
8178 

4794 
6562 
8359 

4969 
6740 
8510 

5145 
6919 

8722 

5321 
7098 
8903 

5498 
7277 
9085 

5674 
7457 
9268 

5851 
7637 
9450 

6029 
7817 
9633 

6206 
7997 
9816 

0 

1 

3 

4     5 

6 

7     8 

9 
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ACTUARIAL  TERMS,  xi-xiii. 

ADJUSTMENT  OF  MOMENTS,  24-30,  102-104. 

ALLIN,  S.  H.  J.  W.,  33. 

APPROXIMATION  TO  ROOT  OF  EQUATION,  128,  123. 

AREAS— 

for  curves,  48,  49,  58,  63,  104,  105. 

moments  of  a  system  of,  28-30,  102-104. 
ARRAY,  107. 

B-FUNCTIONS,  59,  85,  App.  II. 
BEETON,  M.,  150. 

CALCULATION  OF  CURVES,  ch.  V.,  161,  162. 

CHARLIER,  F.  V.  L.,  App.  IV. 

COEFFICIENTS  OF  CORRELATION,  &c.  (see  CORRELATION',  &c). 

CONSTANTS  - 

for  curves,  ch.  I.,  ch.   V. 

table  of,  App.  I. 
CONTACT,  28. 
CONTINGENCY- 

and  correlation,  145-148. 

mean,  vii,  viii. 

mean  square,  vii,  viii.,  ch.  X. 

probable  error  of  mean  square,  149. 

theory  of,  ch.  X. 
CORRELATION,  ch.  VI.,  VII.,   X.,  App.  III. 

and  actuarial  work,  vi.,  150. 

and  contingency,  145-148. 

coefficient  of,  112,  115,  116,  119,  128-130. 

probable  error  of  coefficient  of,  136-138. 

spurious,  122-124. 
CRITERION— 

for  type  of  curve,  42-17,  50. 

for  goodness  of  fit,  ch.  IX. 
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DEATHS,  graduation  of  statistics,  79,  80. 
DIAGRAMS— 

construction  of,  5-8,  49,  72,  80,  89. 

reproduction  of,  6,  7,  11,  12,  57,  64,  67,  73,  81,  86,  90,  108,  114. 

EDGEWORTH,  F.  Y.,  App.  IV. 
ENDOWMENT  ASSURANCES - 

check  valuation  of,  121. 

and  correlation  and  contingency,  106-109,  117-122,  145-149. 
ENTRANTS,  statistics  of,  84-86. 
EXISTING,  statistics  of,  8. 
EXPOSED  TO  RISK— 

hypothetical,  94-100. 

statistics  of,  8,  54-58. 

FREQUENCY-CURVES— 

desiderata,  ch.  I.,  36. 

Pearson's  system,  ch.  IV. 

other  systems,  App.  IV. 

table  of,  47. 
FREQUENCY  DISTRIBUTIONS,  ch.  II. 

G-FUNCTIONS,  70,  71,  74-77. 
GALTON,  F.,  116  (footnote). 
GAMMA  FUNCTIONS,  54,  56,  App.  II.- 
T(p)  when  p  <  1 ,  54. 

r(i),  App.  ii. 

table  of,  166,  167. 
GEOMETRICAL  PROGRESSION,  2,  5,  103. 
GOODNESS  OF  FIT,  ch.  IX. 
GRADUATION— 

36,  ch.  V.,  App.  IV. 

of  rates,  92-100. 
GRAPHICAL  REPRESENTATION— 

of  corvee,  72,  80,  89. 

of  distributions,  5-8. 

HARDY,  G.  F.,  19,  54,  62,  93,  99,  100,  119-121,  139  (footnote). 
HYPERGEOMETRICAL  SERIES,  37,  38. 

INDICES,  dangers  of  using,  122-124. 

KAPTEYN,  J.  C,  App.  IV. 
KING,  G.,  79,  143,  111. 

LEAST  SQUARES,  Method  of,  viii. 

LEE,  A.,  71. 

LEES,  M.  M.,  93. 

LIDSTONE,  G.  J,  xiii.,  22  (footnote),  62,  121. 

LOADING  FOR  EMERGENCIES,  133. 
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MACDONELL,  W.  R.,  125. 

MAKEHAM'S  HYPOTHESIS,  13,  98-100,  144. 

MARRIAGE  STATISTICS,  66,  67,  93-96. 

MASCULINITY,  132,  133. 

MEAN,  9— 

distance  between  mean  and  mode,  41. 

probable  error  of,  135,  136. 
MEDIAN,  viii. 
MODE,  9— 

distance  between  mean  and  mode,  41 . 

position  of,  47. 
MODEL  OFFICE,  for  endowment  assurances,  122. 

King's  statistics,  143,  144. 
MOMENTS— 

adjustment  of,  24-30,  102-104. 

formulas  for  change  of  origin,  17-19,  117. 

method  of,  ch.  III.,  112,  113,  115-121. 

notation  for,  16. 

summation  method,  19-23,  54,  102,  119-12L. 

NEWTON'S  APPROXIMATION  TO  ROOT  OF  EQUATION,  128  (footnote), 

129. 
"  NORMAL  CURVE  OF  ERROR,"  45,  46,  87-91,  App.  ILL,  IV. 

qnm(5)  TABLE,  graduation  of,  96-100. 
ORDINATES— 

loaded,   16. 

mid-,  16,  48,  63. 

moments  of  system  of,  15,  24-28. 

PARABOLA,  fitting  of,  13-15,  30-34. 
PEARSON,  K.,  v.,  vi.,  ix.,  x.,  14,  128,  ch.  IX. 

system  of  curves,  ch.  IV.,  App.  IV. 
PENSION  FUND  STATISTICS,  33,  34,  150. 
PROBABILITY,  connection  with  curve  fitting,  ch.  I.,  37-39. 

Integral  (see  also  "  Normal  curve  of  error  "). 
PROBABLE  ERRORS,  41,  ch.  VIIL,  149. 

QUADRATURE  FORMULAE,  25-27,  48,  58,  63,  130. 

RATES,  graduation  of,  92-100. 
REGRESSION,  113,  116. 

SEX-RATIO,  132,  133. 
SHEPPARD,  W.  F.,  29,  89,  95. 
SICKNESS  TABLES,  graduation,  &c,  70-72. 
SKEWNESS,  11,  41,  49. 
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SPURIOUS  CORRELATION,  122-124. 
STANDARD  DEVIATION,  10-12,  ch.  VIII. 

of  parallel  arrays,  113. 

and  probable  errors,  131,  132. 
SUMMATION  METHOD  OF  FINDING  MOMENTS,  19-23,  54,  102,  110-121. 
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